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Abstract. CRCs have desirable properties for effective error detection. But their software implementation, 
which relies on many steps of the polynomial division, is typically slower than other codes such as weaker 
checksums. A relevant question is whether there are some particular CRCs that have fast implementation. 
In this paper, we introduce such fast CRCs as well as an effective technique to implement them. For these 
fast CRCs, even without using table lookup, it is possible either to eliminate or to greatly reduce many steps 
. of the polynomial division during their computation. 

Index Terms. Fast CRC, low-complexity CRC, checksum, error-detection code, Hamming code, period of 
^1^< polynomial, fast software implementation. 
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This paper considers cyclical redundancy checks (CRCs), which are effective for detecting errors in com- 
munication and computer systems. An h-hit CRC is typically generated by a binary polynomial of the 
form 

M{X) = {X + l)Mi{X) (1) 

tyj ' where Mi (A) is a primitive polynomial of degree h — 1. Existing CRCs include the CRC-16 generated by 

O ! X^^ + X^'^ + + 1 = {X + l)(Ai^ + X + 1), and the CRC-CCITT generated by X^^ -I- A^^ -|- -I- 1 = 

(A + l)(Ai5 + A" + Ai3 + + A"* + A''' + A2 + A + 1). 

The CRC generated by (1) has the following desirable properties: (a) its maximum length is 2''"^ — 1 
^ . bits, (b) its minimum distance is d = 4, i.e., all single and double errors are detected, (c) its burst-error 

Q"^ ' detecting capability is b = h, i.e., all error bursts of length up to h bits are detected, and (d) its codewords 

I have even weights, i.e., all odd numbers of errors are detected. These properties are called the guaranteed 

CTS ■ error-detecting capability. The CRC may detect other errors, but not guaranteed, e.g., it can detect a 

' large percentage of error bursts of length greater than h [2, 12, 18]. General theory and applications of 

, error-detection codes are presented in [8] . 

' General-purpose computers and compilers are increasingly faster and more sophisticated. Software algo- 

[ rithms are commonly used in operations, modeling, simulations, and performance analysis of systems and 

networks. CRC implementation in software is desirable, because many computers do not have hardware cir- 
cuits dedicated for CRC computation. However, software implementation of typical CRCs is slow, because 
it relies on many steps of the polynomial division during CRC computation. It is this speed limitation of 
CRCs that leads to use of checksums (which are fast and typically do not rely on table lookup) as alterna- 
5h ' fives to CRCs in many high-speed networking applications, although checksums are weaker than CRCs. For 

example, the 16-bit onc's-complcmcnt checksum is used in Internet protocol and the Fletcher checksum is 
used in ISO [6, 21]. There are also other fast error-detection codes [4, 5, 14, 15], but they do not have all 
the desirable properties of CRCs. 

A relevant question is whether there is a new family of CRCs that are faster the existing CRCs. In this 
paper, we introduce such CRCs, as well as a technique for their efficient implementation. For these fast 
CRCs, it is possible either to eliminate or to greatly reduce many steps of the polynomial division during 
their computation. 

A common existing technique for reducing the many steps during CRC computation is to use table lookup, 
which requires extra memory [11, 17, 18, 20]. In contrast, even without table lookup, our fast CRCs require 
only a small number of steps for their computation. Algorithms that do not rely on table lookup have an 
advantage of being less dependent on issues such as cache architecture and cache miss. In particular, it is 
possible to use as low as 1.5 operations per input message byte to encode our fast 64-bit CRC (which is 
implemented in C and requires no table lookup). 

This paper, an extension of [16], is organized as follows. In Section 2, we review known facts about CRCs, 
which serve as the background for our discussions. We present several different algorithms for computing 
CRCs. some of which arc designed especially for our fast CRCs. In Section 3, we identify the form of the 
generator polynomials for the fast CRCs, and introduce a new technique for their implementation. We 
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then determine their guaranteed error-detecting capabihty: the minimum distance, the burst-error-detecting 
capabihty, and the maximum code length. In Section 4, we discuss CRC software complexity and show that 
our fast CRCs are typically faster than other CRCs. In Section 5, we present summaries and extensions of 
the paper. 

1.1 Notation and Convention 

In this paper, we consider polynomials that have binary coefficients and 1. Thus, all polynomial operations 
are performed in the binary field GF(2), i.e., by using polynomial arithmetic modulo 2. Let A{X) and 
M{X) be 2 polynomials, then Rm(x) l^i^)] denotes the remainder polynomial that is obtained when A{X) 
is divided by M{X). We must have degree(RM(x) IM^)]) < degree(Af(X)). 

An s-tuple denotes a block of s bits A = {as-i,as-2, ■ ■ ■ , ai, ao), which is also presented by the binary 

polynomial as-iX^~^ + as-2X^~^ H \-aiX + ao of degree less than s. We use the closely related notation 

A{X) to denote this polynomial, i.e., A is composed of the binary coefficients of A{X). Thus, the tuple A and 
the polynomial A{X) are equivalent and can be used interchangeably. Typically, the polynomial notation 
is used to describe the mathematical properties of codes, whereas the tuple notation is used to describe 
the algorithmic properties (such as pseudocodes and computer programs) of codes. If Qi{X) and Q2{X) 
are si-tuple and S2-tuple, respectively, then the (si + S2)-tuple {Qi{X),Q'2{X)) denotes the polynomial 
Qi{X)X''^ + Q2iX), which is the concatenation of Q2{X) to Qi{X). 

In this paper, we are interested in CRCs that have low software complexity. Software complexity of 
an algorithm refers to the number of operations (i.e., operation count) used to implement the algorithm 
(whereas hardware complexity refers to the number of gates used to implement the algorithm). Suppose 
that we have 2 CRCs that operate under similar environments and use similar types of operations, but one 
CRC requires lower operation count (e.g., having a smaller loop) than the other. It is likely that the CRC 
with lower operation count (i.e., lower software complexity) will result in faster encoding. Thus, complexity 
correlates with speed. However, the amount of the correlation also depends on many other complicating 
factors such as memory speed, cache size, compiler, operating system, pipelining, and CPU architecture. 
A CRC is called "fast" if it has low software complexity and low memory requirement (e.g., it requires no 
lookup table or only a small lookup table). A CRC is called "faster" than another if, for a similar level of 
memory requirement, it has lower software complexity. 

An algorithm (or implementation) is called bitwise if it does not use table lookup. Note that a bitwise 
algorithm does not necessarily involve only bit-by-bit manipulation or computation. Fast checksums are 
typically bitwise. Bitwise algorithms, which do not rely on table lookup, have an advantage of being less 
dependent on issues such as cache architecture, cache miss, and software code space. Ideally, fast CRC 
algorithms should have low complexity and be bitwise. Thus, unless explicitly stated, we focus on bitwise 
algorithms in this paper. Table-lookup algorithms are presented in Appendix A. 

The notation {k,l,d) denotes a systematic code with k = the total bit length of the code, I = the bit 
length of the input message, and d — the minimum distance of the code. The burst-error detecting capability 
of a code is denoted by b. To facilitate cross-references, we label some blocks of text as "Remarks," which 
are an integral part of the presentation and should not be viewed as isolated observations or comments. 

2 CRC ALGORITHMS 

In this section, we review some known facts about software CRC implementation (e.g., see [2, 5, 7, 11, 14, 
17, 18, 20]). To lay a firm foundation for our later discussions, we present these facts in more precise and 
general forms than those often seen in the literature. Our presentation is a straightforward generalization of 
the results in [18]. 

2.1 General CRC Theory 

Suppose that we use an h-hit CRC, generated by a polynomial M{X) of degree /?,, to protect an input 
message U{X), which has I bits. By definition, the check polynomial P{X) is the remainder that is obtained 
by dividing U{X)X^ by M{X), i.e., P{X) = Rm{x) [U{X)X'^'\. Because computers can process tuples of 
bits (e.g., bytes or words) at a time, codes having efficient software implementation should be encoded on 
tuples. Typical modern processors can efficiently handle tuples of 8, 16, 32, and 64 bits. 

Let s > be any positive integer. We can write I = r + {n — l)s, for some n > and < r < s. 
We then process the CRC by dividing the input message U{X) into n tuples. The first tuple has r bits, 
and all the other tuples have s bits. Because r < s, we can then insert (s — r) zeros to the left of U{X) 
to increase its length from I to I' = I + s — r = ns, without affecting the CRC computation, because 
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Rm(x) [(0, 0, . . . , 0, U{X))X''] = Rm(x) [U{X)X^] = P{X). That is, the first tuple now also has s bits, the 
(s — r) left-hand bits of which are always zeros. 

Because each tuple i has s bits, it can be represented by a polynomial Qi{X) of degree < s. Thus, the 

input message is represented by U{X) = {Qi3{X),Qi{X), . . . ,Qn^i(X)). We emphasize that, for given h 
and we are free to choose the value of s (commonly chosen values arc s = 8, 16, 32, and 64 bits). As shown 
later, the choice of s can have significant impact on CRC speed. 

Define Ui{X) = [Qq{X), Qi{X), . . . , Qi{X)) to be the first i + 1 input tuples, i.e., 

Uo{X)^Q^{X) 

Ui{X) = {Qo{X),Q,{X)) 



Un-l{X) = {Qo{X), Qi(X), . . . , Qn-l{X)) 
= U{X) 

Thus, for i = 1, 2, . . . , n — 1, Ui{X) is determined from Ui-\{X) and Qi{X) by 

Ui{X) = {U,.i{X),Q,{X)) 

= U,.i{X)X' + Qi{X) (2) 
For i = 0, 1, . . . , n — 1, let Pi{X) be the CRC check polynomial for the partial input message Ui{X), i.e., 

P,(X)=Rm(x) [C/iW^I (3) 
In particular, we have Po{X) = Rm(x) [Qo{X)X^] , and 

Pn-l{X) = Rm{X) [Un-iiX)X'^] 
= Rm(x) [U{X)X''] 
= P{X) 

which is the CRC check polynomial for the entire input message U{X). 
Substituting (2) into (3), we have 

Pi{X) = Rm(x) [Ui{X)X^] 

= Rm(x) + Q,iX))X^] 

= Rm(x) [{Ui-,{X)X^)X'] +Rm(x) [Qi{X)X^] 

Using (3), we then have 

Pi{X) = Rm(X) [P^-liX)X■'] + Rm(X) [Q^iX)X^] 

= Rm(x) [P^-liX)X' + Q,{X)X''] (4) 

for i = 1, 2, . . . , n — 1. Note that (4) is a straightforward generalization of a result in [18], which deals with 
the special cases h = 16 and s € {8,16}. Thus, the check tuple Pi{X) is computed from Qi{X) and the 
previous check tuple Pi-i(X). Recall that Po{X) = Rm{X) [Qq{X)X^] and P{X) = P„-i(X) is the CRC 
check tuple for U{X). Using (4), P{X) is then computed via the following pseudocode: 



P = 0; 

for (0 < i < n) 

P = Rm [PX' + QiX*'] 
return P; 



Remark 1. Wc now review the computational complexity of the polynomial division, which is needed in 
CRC computation. Given 2 polynomials W{X) and Y{X), let V{X) = Rw{x) \Y{X)] be the remainder 
polynomial that is obtained when Y{X) is divided by W{X). Let w and y be the degrees of W{X) and 
Y{X), respectively. If y < w (i.e., y — w 1 < 0), then V{X) = Y{X), i.e., no polynomial division is 
needed to obtain the remainder V{X). If y > w, we then need the polynomial division that requires a loop 
of y — u> + 1 iterations to obtain the remainder V{X) (sec [9], p. 421). To summarize, the polynomial "long 
division" for computing Rw(x) [Y{X)] requires a loop of max(0, y — w + 1) iterations. □ 



3 



2.2 Two CRC Algorithms 

Prom (4), we have 

Pi{X) = Rm(x) [{Pi-i{X) + Q^{X)X'^-')X'] (5) 

if s < /i, and 

Pi{X) = Rm(x) [(Pi-i(X)X^-^ + Qi{X))X^] (6) 

a s > h. The CRC algorithms based on (5) and (6), called Algorithm 1 and Algorithm 2, are shown in 
Figs. 1 and 2, respectively. 
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B = 0; 
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for (0 < i < n) 


3 


{ 


4 


A = B + Q,X''"'; 


5 


B = Rm [AX']; 


6 


} 


7 


P = B; 


8 


return P; 



Fig. 1 CRC Algorithm 1 for computing the check /i-tuple P from the input s-tuples Qq, . . . , Qn-i (s < h). 



B = 0; 

for (0 < i < n) 
{ 

A = BX'-'' + Q, 
B = Rm [ax""] ; 

} 

P = B; 
return P: 



Fig. 2 CRC Algorithm 2 for computing the check /i-tuple P from the input s-tuples Qo) • • • > Qn-i {s > h). 



2.3 Two Alternative CRC Algorithms 

We now present 2 alternative CRC algorithms, which will be applied to our fast CRCs (see Section 3). 

Case 1: s < h. The CRC check polynomial Pj{X) for the partial input message Uj{X) can be divided 
into 2 parts as 

P,-(X) = P,-,2(X)) = + Pj,2{X) (7) 

where Pj,i{X) and Pj^2{X) are polynomials with degree(Pjj(X)) < s and degree(Pj.2(-^)) < h — s. That is, 
and Pj^2iX) are the s left-hand bits and (h — s) right-hand bits of Pj{X), respectively. Substituting 
(7) into (4), we have 

Pi{X)=RMiX) [(P.-l,l(X)X^-«+Pi_i,2(X))X«] +Rm(X) 

= Rm(X) [{Pi-lAX) + QiiX))X^] + Rm(X) [P^-1,2(X)X«] 

Because degree(Pi_i,2(^)^^) < /i = degree(M(X)), we have Rm(x) [Pi-i,2{X)X'] = Pi_i,2(X)X^ Thus, 
Pi(X) = Rm(x) [(Pi-i,i(X) + Qi(X))X'^] +Pi_i,2(X)X^ (8) 

The CRC algorithm based on (8), called Algorithm 3, is shown in Fig. 3. 
Case 2: s > h. Multiplying both sides of (6) by X^~^, we have 

Pi{X)X'-'^ = (Rm(x) [{Pi-i{X)X'-'^ + Qi{X))Xf^]) X'-f' 
= Rm(x)x-'. [{Pi-i{X)X'-'' + Qi{X))X''X'-f^] 

= ^M(x)xs-H [{Pi-i{X)X^-'^ + Qi{X))X^] (9) 



Define Lj{X) = Pj{X)X^-^. From (9), we then have 



Li{X) = Rn^x) [{Li-iiX) + QiiX))X'] (10) 

where N{X) = M{X)X^-''. Thus, Li{X) is computed from Li-i(X) and Qi{X). 

Note that Lo{X) = Po{X)X^-^, where Po{X) = Rm(x) [Qo(^)^'*] • We then have 

Lo(^) = (Rm(x) [Qo{X)X''])X'-^' 

= R'M(X)X»-h [QoiX)X'^X''~'^] 
= ^N(X) [Qo{X)X''] 

Because Li{X) = Pi{X)X^~^, the term Pi{X) is obtained by shifting Li{X) to the right by (s — h) bits. 
Note that degree(Li(X)) < s. We will show in Remark 2 that computing Pi{X) via (10) is slightly faster than 
via (6). The CRC algorithm based on (10), called Algorithm 4, is shown in Fig. 4, where N{X) = M{Xy~^. 
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P = 0; 
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for {0 <i <n) 
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Pi — s left-hand bits of P; 
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P2 = {h — s) right-hand bits of P; 
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A = Pi+ Qi- 
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B = Rm [AX'^] ■ 
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P = B + P2X'; 
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} 


10 


return P: 



Fig. 3 CRC Algorithm 3 for computing the check /i-tuple P from the input s-tuples Qo) • • • j Qn-i {s < h). 
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5 = 0; 
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for (0 < i < n) 
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B = Rn [AX']; 
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} 
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P = h left-hand bits of B; 


8 


return P; 



Fig. 4 CRC Algorithm 4 for computing the check /i-tuple P from the input s-tuples Qo, • • • , Qn-l (* > h). 

Remark 2. Suppose that s > h. The check polynomial P(X) — P„_i(X) = Rm(x) [C^ri-i(^)^''] can 
then be computed by Algorithm 2 (Fig. 2) or by Algorithm 4 (Fig. 4). Wc now show that, for bitwise 
implementation. Algorithm 4 is slightly faster than Algorithm 2. By comparing these 2 algorithms, we 
observe the following. First, the computation of Rm{x) [A{X)X'^] (in Fig. 2) and the computation of 
RjV(x) [^(-'^)-'^*] (in Fig. 4) have the same complexity, because each requires s iterations (by Remark 1). 
Next, the factor X^~'^ at line 4 of Fig. 2 disappears from line 4 of Fig. 4. Finally, one extra operation is 
required at line 7 of Fig. 4 to extract the h left-hand bits of the final B{X). The above observations imply 
that Algorithm 4 requires n — 1 fewer operations than Algorithm 2. Thus, for bitwise implementation, we 
will use Algorithm 4 when s > h. □ 

2.4 Basic CRC Algorithms 

Given an input message U{X) and a generator polynomial M{X) of degree h, Algorithms 1-4 produce the 
same CRC check tuple P{X). That is, they are 4 different ways for accomplishing the same thing. The main 
difference among these algorithms is how the input message is divided into s-tuplcs Qi{X). Algorithms 1 
and 3 are for s < /i, whereas Algorithms 2 and 4 are for s > /i. As shown later, CRC speed depends on 
the choice of s. For flexibility, we allow the possibility that the same CRC is used by computers that have 
different architectures and capabilities. For example, one computer can choose a value of s for encoding a 
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message to transmit to another computer (with different capabilities), which can choose a different value 
of s for detecting the errors in the received message. 

The above CRC algorithms require the polynomial divisions. In particular, Algorithm 1 requires the 
polynomial division Rm(x) [^(-^)^'']! Algorithms 2 and 3 require the polynomial division Rm(x) , 
and Algorithm 4 requires the polynomial division RAr(x) [^(-'^)-'^^]- To simplify the presentation, we will 
use the single notation B{X) to denote all these polynomial divisions, i.e., we define 

r Rm(x) [A{X)X^] (Algo. 1) 
B{X) = \ Rm(x) [A{X)X"'] (Algos. 2 and 3) (11) 
[ Rjv(x) [AiX)X^] (Algo. 4) 

where N{X) = M{X)X^-^. Note that degree(^(X)) < h in Algorithm 1, and dcgree(^(X)) < s in 
Algorithms 2-4. As seen in Figs. 1-4, CRC computation using any of the above 4 algorithms requires the 
computation of B{X) for n times. 

A known technique for computing B{X) is to use the polynomial long division algorithm mentioned in 
Remark 1. For example, consider Algorithms 2 and 3. We then have B{X) — Rm(x) where 
degree(A(A:)) < s. Because dcgrcc(^(A)A'') < s + h-1 and degree(M(X)) = h, from Remark 1, B{X) can 
be computed via the polynomial long division that requires a loop of s iterations. Similarly, it can be shown 
that computing B{X) in Algorithms 1 and 4 also requires a loop of s iterations. That is, the computational 
complexity for computing B{X) is 0{s). 

Definition 1. The technique for computing the polynomial B{X) as given in (11) is called the basic 
technique. Using the polynomial long division, B[X) can be computed in s iterations. An algorithm (or a 
CRC) is basic if it uses the basic technique for computing B{X). 

3 FAST CRCS 

Recall that we are given an input message Un-i{X) — {Qq{X), Qi{X), . . . , Qn-i{X)), where Qi{X) is an 
s-tuple. We protect this message by an /i-bit CRC generated by a polynomial M{X) of degree h. The check 
/i-tuple 

P{X) = Pn-l{X) = Rm(X) [Un-l{X)X''] 

can be computed by Algorithm 1 or 3 (if s < /i), or by Algorithm 2 or 4 (if s > /i). We emphasize that each 
of these algorithms requires the calculation of B{X) defined in (11), which involves the polynomial division. 

3.1 Fast h-B\t CRCs 

Our goal is to find some CRCs that have fast implementation, i.e., to find a new family of generator 
polynomials M{X) for CRCs that have low complexity. Recall that the CRC algorithms (Figs. 1-4) depend 
on the term B{X). Computation of B{X) is also the most expensive step in the algorithms. Thus, finding 
fast CRCs requires finding the polynomials M{X) that yield fast computation of B{X). 

The first technique for computing B{X) is the basic technique in Definition 1. Using the polynomial 
division, we can compute B{X) by a loop of s iterations. In the following, we present the second technique, 
called the new technique, for computing B[X). While the new technique is applicable to any generator poly- 
nomial M(X), it is more effective for some special CRC generator polynomials, called the fast polynomials. 
Recall that the basic CRCs can use Algorithm 1 or 3 (if s < /i), or by Algorithm 2 or 4 (if s > /i). However, 
as seen in the following, the fast CRCs use only Algorithms 3 (for s < h) and 4 (for s > h) for their bitwise 
implementation. 

We now introduce a new family of CRCs, which are generated by the following polynomials 

Fh{X)=X'' + X^+X + l (12) 

for all h > 4. We ignore the case h = 3, which yields the trivial repetition code {(0000), (1111)}. We call 
Fh{X) the "fast polynomial," which can be factored into 

X'' + X^+X + 1 = {X + l)Gh-i{X) 

where 

Gm{X) = X"" + X"'-'^ + ■ ■ ■ + X^ + X"^ + 1 (13) 
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i.e., Gm{X) includes all the terms except X. At first, it is not clear why this particular polynomial Fh{X) 
will speed up the computation of B{X). We now introduce a technique that is applied to Fh{X) to yield 
fast computation oi B{X). 

By considering Algorithms 3 and 4, we have from (11) 



where N(X) = M{X)X'^ ^ , and A{X) is a polynomial of degree less than s. We now transform B[X) into 
a new form that will be used by the fast CRCs. First, note that 



where N{X) = M{X)X^-^. 

Definition 2. Using Algorithms 3 and 4, the technique (15) for computing the polynomial B{X) is called 
the new technique. The CRC that is generated by the fast polynomial Fh[X) = X^ + X'^ + X and uses 
the new technique for computing B{X) is called the fast /i-bit CRC. 

Theorem 1. Using Algorithms 3 and 4, the polynomial B{X) for the fast CRC generated by Fh{X) = 
X^ + X'^ + X + lis given by 



where N{X) = Fh{X)X^ ^ , and A{X) is a polynomial of degree less than s. Further, using the polynomial 
division, B{X) can be computed with max(0, s — h + 2) iterations. 

Proof. Relation (16) follows by using (15) with M{X) = Fh{X) and N{X) = Fh{X)X''-^ . First, 
suppose that s < h. Then B{X) = Rf^(x) + ^ + '^)] ■ Because degiee{FhiX)) = h and 

degTee{A{X){X'^ + X + 1)) < s + 2, from Remark 1, B{X) can be computed with max(0, s — /i + 2) iterations. 
Next, suppose that s > h. Then B{X) = Rn(x) [A{X)X-'-''-{X^ + X + I)]. Because degTce{N{X)) = s 
and degree{A{X)X'^~'^ {X"^ + X + 1)) < 2s — h + 2, Remark 1 implies that B{X) can also be computed with 
max(0, s — h + 2) iterations. □ 

Let us briefly compare the computational complexity of B{X) for (a) the basic ft,-bit CRC generated 
by M{X) and (b) the fast h-Ut CRC generated by Fh{X). For the basic CRC, by Definition 1, B{X) 
is computed via a loop of s iterations, regardless of the form of M{X). However, for the fast CRC, by 
Theorem 1, B{X) is computed via a loop of only max(0, s — h + 2) iterations. Thus, the fast CRC is much 
faster than the basic CRC if s is chosen such that s — ft. + 2 is much small than s. Further, ii s + 1 < h, then 
max(0, s — h + 2) = and B{X) = A{X){X'^ + X + 1), i.e., the polynomial division is eliminated. Section 4 
presents CRC software complexity in more detail. 

We emphasize that the fast /i-bit CRC denotes a CRC that meets the following 2 conditions: (a) it is 
generated by the fast polynomial Fh{X) — X^ + X"^ + X + 1, and (b) the polynomial B{X) is computed 
via Theorem 1 by applying the new technique (15) to Fh(X). That is, the fast CRC refers to a CRC that 
is generated by a specific polynomial and is implemented by a specific technique. Note that a CRC that 
meets only one of the above 2 conditions may not have any speed advantage over a basic CRC. For example, 
suppose that, instead of the new technique (15), the basic technique (in Definition 1) is applied to the CRC 
generated by the fast polynomial Fh{X). This CRC is then not different from a basic CRC in terms of 
computational complexity. Application of the new technique to polynomials other than Fh{X) is considered 
in Appendix C. 

To summarize, the fast /i-bit CRC is generated by Fh{X) = X^ + X^ +X + 1. Under bitwise implemen- 
tation, the fast CRC uses Algorithm 3 if s < /i and Algorithm 4 if s > /i. The term B{X) in these algorithms 
is given in Theorem 1. 




(14) 




(15) 




(16) 
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3.2 A Fast 16-Bit CRC 



We now consider the important case h — 16. Many CRCs (as well as weaker checksums) used in practice 
have 16 check bits, e.g., the CRC-16 and CRC-CCITT mentioned in Section 1. With a small amount of 
overhead, these CRCs can have length up to 2^^ — 1 bits w 4,096 bytes. Our goal here is to present a concrete 
example of a new 16-bit CRC that is not only much faster than but also as good as existing 16-bit CRCs. 
Our new 16-bit CRC is generated by 

Fie{X) ^ X^^ + + X + I (17) 

which can be factored into 

FieiX) = {X + l)Gi5{X) 

where 

Gi5{X) ^X^^+X^^ + --- + X^ + X^ + l 

It can be shown that Gi^lX) is a primitive polynomial, i.e., Fiq{X) is a product of X -I- 1 and a primitive 
polynomial (however, as seen later, this is not true for many values of h). Thus, this fast 16-bit CRC also 
has length up to 2^^ — 1 bits. Although the polynomial (17) is different from the generator polynomials for 
existing 16-bit CRCs, it does generate a CRC that has the same guaranteed error-detecting capability as 
existing 16-bit CRCs. From Theorem 1, we have 

T^(Y\-^ ^^-(^) [^(^)(^' + ^ + 1)] if < 16 

\Rn(x)[A{X)X'-^^{X^+X + 1)] ifs>16 

where N{X) = Fie(X)X''-'^^ . 

In the following, we consider 2 cases: s — 8 and s = 16. First, assume that s = 8, i.e., the input 
message is organized in 8-bit bytes. Because s < 16, we have B{X) = Rp^^(^x) [-^iXjiX^ + X + 1)] . Because 
degTee{A{X)) < s = 8, we have degTee{A{X){X'^ + X + 1)) < 10, which is smaller than degree(Fi6(X)) = 16. 
From Remark 1, we have 

B{X) = A{X){X^ + X + 1) 

= A{X)X^ + A{X)X + A{X) 

i.e., B{X) is simply the sum of A{X) and its translations. Thus, computing B{X) via the new technique 
requires no polynomial division. In contrast, computing B{X) via the basic technique requires the polynomial 
division that has a loop of s = 8 iterations (see Definition 1). 

Next, assume that s = h — 16, i.e., the input message is organized in 16-tuples. Because s — 16, we 
have degree(A(X)) < 16 and degree(A(X)(X2 + X + I)) < 18. Thus, by Remark 1, B{X) is computed by 
the polynomial division that has a loop of 2 iterations. This contrasts with computing B{X) via the basic 
technique, which requires a loop of s = 16 iterations (see Definition 1). Thus, the loop iteration count of our 
new technique is less than that of the basic technique by the factor of 16/2 — 8. 

To summarize, when the input message is organized in s-tuples, it is possible to have a fast 16-bit CRC 
that requires no polynomial division (when s — 8), or that requires the polynomial division that has only 2 
loop iterations (when s — 16). Further, this fast 16-bit CRC has the same guaranteed error-detecting 
capability as existing 16-bit CRCs. 

When computing B{X) via the new technique, although the case s = 16 requires more loop iterations 
than the case s = 8, we will see later in Section 4 that the case s = 16 has lower overall computational 
complexity (i.e., lower overall operation count per input byte). This is because, when s = 16, there is no 
need to compute Pj^i{X) and Pj^2{X) as defined in (7). Further, the overhead processing cost per input byte 
when s = 16 is lower than when s — 8. The C programs for the fast 16-bit CRC are shown in Fig. 8 and in 
Fig. 12 of Appendix A. 

3.3 Error-Detection Capability of Fast CRCs 

Recall that the maximum length of the h-hit CRC generated by (1) is 2^~^ — 1 bits, i.e., this CRC has 
minimum distance d = 4 if its total bit length < 2'*"^ — 1. In general, we define the maximum length of an 
error-detection code to be the total bit length at or below which its minimum distance is d > 3, and beyond 
which its minimum distance will reduce to d < 2. In the following, we determine the maximum lengths of 
the fast CRCs. 
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By definition, the period of a polynomial G{X) is the smallest positive integer i such that Roix) [^^] — 1- 
In particular, it can be shown that the period of M(X) in (1), which is the product oi X + 1 and a primitive 
polynomial of degree /i- 1, is 2'*-! - 1. Note that some polynomials, such as X^, do not have periods. 

The period of the fast polynomial Fh{X) = X^ + X"^ -\- X + \ can be computed directly from the definition 
(for small h) or from the technique in [1, Section 6.2]. The periods of Fh{X), h > A, are shown in Fig. 5. 
The following theorems, which are slight variations of well-known results from cyclic codes [12, Chapter 4], 
show that the maximum length of a CRC equals the period of its generator polynomial. 

Theorem 2. Let C be a CRC generated by a polynomial M{X) of degree h > 3. Assume that M{X) is not 
a multiple of X. Let Tib and d be the bit length and minimum distance of C, respectively. We then have 

1. d > 3 if nb < period of M{X). 

2. d = 2 if rib > period of M{X). 

3. C detects all error bursts of length up to h bits, i.e., b = h. 

Proof. Let t be the period of M{X). We must have t > h. By definition, each codeword of C has the form 

V{X) = U{X)X'' + P{X) 

where U{X) is the polynomial representing the input message, and P{X) is the check polynomial. Because 
P{X) Rm{x) [U{X)X''] , we have 

U{X)X''' = K{X)M{X) + P{X) 

for some polynomial K{X). Thus, we have 

V{X) = U{X)X'' + P{X) = K{X)M{X) 

i.e., C is a linear code. If d = 1, then X'' = K{X)M{X), for some i. This implies that M{X) = X^ for 
some j, which contradicts our assumption that M(X) is not a multiple of X. Thus, d > 2. 

1. We now prove, by contradiction, the statement d > 3 if rib < period of M{X). Thus, suppose that there 
is a codeword V{X) with length rib < t and weight 2. Then V{X) = X^ + X^ for some i and j such that 
n^>3>i> 0. Thus, V{X) = X'{X^-' + 1). 

We also have V{X) = K{X)M{X) for some polynomial K{X). Thus, X''{X^-' + 1) = K{X)M{X). 
Because M{X) is not a multiple of X by assumption, M{X) must divide X-*^* + 1, i.e., Rm(X) [^■'~*] = 1- 
Thus, j — i > t = period of M{X). Then j > t > rib, which contradicts the condition rib > Thus, all the 
codewords of length rih < t must have weight > 3, i.e., d > 3. 

2. We construct a codeword with length > t and weight 2 as follows. Let U{X) = X*~'^. Then P{X) = 
Rm{X) [UiX)X''] = Rm(x) [X*]. We have P{X) = 1 because t is the period of M{X). Thus, the codeword 
V{X) = U{X)X^ + P{X) = X* + 1 has length i + 1 and weight 2. That is, d 2 if rib > t. 

3. The fact that C detects all error bursts of length up to h bits (i.e., h ~ h) is well-known [12]. □ 

Theorem 3. Let C be the CRC generated by the fast polynomial Fh{X) ~ X^ + X"^ + X + 1. Let rib and d 
be the bit length and minimum distance of C, respectively. We then have 

1. d = 4 if rib < period of Fh{X). 

2. d = 2 if rib > period of Fh{x). 

3. C detects all error bursts of length up to h bits, i.e., b ^ h. 

Proof. Let t be the period of Fh{X). From the proof of Theorem 2, every codeword of C has the form 
V{X) = K{X){X^ -{- X"^ -|- X -I- 1) for some polynomial K{X). Thus, the codewords of C have even weight, 
i.e., d is even. 

Suppose now that the input message is U{X) = 1. Then P{X) = R^,^(x) [^''] = -^^ -I- X -I- 1, which 
implies V{X) = U{X)X^ P{X) =X^ + X^ +X + 1. That is, the codeword V{X) has weight 4. Thus, d 
is either 2 or 4. From Theorem 2.1, we must have d = 4 if rib < i. From Theorem 2.2, we must have d = 2 
if rib > t. The fact that C detects all error bursts of length up to h bits is well-known [12]. □ 

Parts 1 and 2 of Theorem 2 show that the maximum length of a CRC equals the period of its generator 
polynomial (which, by assumption, is not a multiple of X). Theorem 2 also explains why, as seen above, 
both the maximum length of the h-hit CRC generated by (1) and the period of its generator polynomial 
equal 2^~^ — 1. Similarly, parts 1 and 2 of Theorem 3 show that the maximum length of the fast /i-bit CRC 
equals the period of its generator polynomial Fh{X) = X'^ + X^ + X + 1. 
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Fig. 5 shows that the maximum length of the fast /i-bit CRC is also 2^ -'^ — 1 in many important 
cases, namely when h = 8, 16, 24, 48, 64, 128. In fact, Fh{X) = X''' + X'^ + X + 1 is also the product of 
X + 1 and a primitive polynomial at these values of h, i.e., the polynomial Gh,-i{X) in (13) is primitive 
when h = 8, 16, 24, 48, 64, 128. Fig. 5 also shows that the maximum lengths of many fast h-hit CRCs are 
substantially less than the upper bound 2^~^ — 1 (e.g., when h= 12 and h = 32). However, in Appendix C, 
wc apply our new technique to more general generator polynomials to yield other fast CRCs whose maximum 
lengths can approach the upper bound. 
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Fig. 5 The period of Fh{X) = X'^ + X'^ + X + 1 [= the maximum length of the fast /i-bit CRC generated by Fh{X)]. 
4 CRC SOFTWARE COMPLEXITY 

We now analyze and compare CRC software complexity. Software complexity of an algorithm refers 
to the number of operations (i.e., operation count) used to implement the algorithm. Our goal in 
this paper is to compute the CRC check ft,-tuple P{X) for an input message that consists of n tuples 
Qo{X),Qi{X), . . . , Qn-i{X). Each tuple Qi{X) has s bits. This CRC can be cither a basic CRC generated 
by a polynomial M{X) of degree h, or the fast CRC generated by Fh{X) ^ X'^ + X'^ + X + 1. For bitwise 
implementation, while Algorithm 1, 2, 3, or 4 can be used for the basic CRC, only Algorithm 3 or 4 are 
used for the fast CRC. The check tuple P{X) is computed by using a loop that computes B{X) for n times, 
where B(X) is given in Definition 1 for the basic CRC and in Theorem 1 for the fast CRC. 
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In this section, we compute ei, and e f , which denote the software operation counts per input byte required 
for computing the check tuple P{X) for the basic CRC and the fast CRC, respectively. These operation 
counts will then be used to compare the complexity among our fast CRCs, the basic CRCs, the other fast 
CRCs in [5], and the block-parity checksum. An error-detection code is said to be "faster" than another if, 
for a similar level of memory requirement, it has lower software complexity. 

4.1 General Complexity Analysis 

We now provide the complexity analysis for the important case s = /i for the basic CRC and the fast CRC 
(other cases can be analyzed similarly). Both Algorithms 2 and 4 (shown in Figs. 2 and 4) then reduce to 
Fig. 6. Here, we have 

B{X) = Rm(x) [A{X)X^] (18) 
for the basic CRC (see Definition 1), and 

BiX) = Rp.^x) [A{X){X' + X + 1)] (19) 

for the fast CRC (see Theorem 1), where A{X) is a polynomial of degree less than s. Note that different 

CRC algorithms refer to different techniques for computing B{X). In particular, a CRC algorithm is called 
table lookup or bitwise, depending on whether the term B{X) in the algorithm is computed with or without 
table lookup. The bitwise technique is presented in this section. The table-lookup technique is presented in 
Appendix A. 
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B = 0; 


2 


for (0 < i < n) 
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{ 


4 


A = B + Q,; 


5 


J Km [AX"] ■ for basic CRC 
^ ~ 1 Rf [A{X^ + X + 1)] ; for fast CRC 


6 


} 


7 
8 


P = B; 
return P; 



Fig. 6 CRC algorithm [s = h). 



Remeirk 3. The term B{X) = Rm(x) [^(^)^*] in (18) can be computed as follows. First, we write 
A{X)X^ = (■ ■ ■ {A{X)X) ■ ■ ■)X. Thus, B{X) can be computed in s iterations via the following pseudocode: 




where Rm [AX] is computed by 



p iAx]_ I AX + M ifmsb(A) = l 
Rm[AX]-\^^ ifmsb(A)=0 



where msb(A) denotes the most significant bit of A. The term Rm [AX] in (20) can also be computed by 
using a table T[ ] of only 2 entries defined by T[0] = and T[l] = M. We then have 

Rm [AX] =AX + T[msh{A)] (21) 

□ 

Let u be the operation count required for computing Rm{x) [A{X)X]. Using Remark 3, the operation 
count required for computing B{X) in (18) for the basic CRC is then s{u+ls), where Is denotes the operation 
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count for the loop overhead shown at hne 1 of the pseudocode in Remark 3 (in particular, = if loop 
unrolling is used). 

Let us now consider the term B{X) in (19) for the fast CRC. We have 

B{X) = Rf^^x) [AiX)X^] + Rf^^x) [AiX)X] + A{X) 

= RF,ix)[Bi{X)X] + B,{X) + AiX) (22) 

where Bi{X) = Rp^^i^x) [■A(X)X], which has operation count u. After Bi{X) is computed, Rf^(x) [Bi{X)X\ 
also has operation count u. There are also 2 binary additions (i.e., 2 XOR operations) in (22). Thus, the 
operation count required for computing B{X) in (22) for the fast CRC is 2u + 2. 

Let us now determine the total operation counts tb and </ for computing the check tuple P{X) for the 
basic CRC and the fast CRC, respectively. The CRC algorithm for computing P{X), which is shown in 
Fig. 6, has a loop of n iterations. In addition to the operation count for B[X), there is also one addition as 
indicated in line 4 of Fig. 6. Let Z„ be the operation count for the loop overhead shown at line 2 of Fig. 6. 
We then have 

tb^n[ln + l + s{u + ls)] (23) 

= n(/„ + 3 + 2u) (24) 

The basic CRC and the fast CRC require and tf operations, respectively, to compute the check tuple 
P{X) for the input message that has ns bits, i.e., tb/{ns) and tf/{ns) operations are required per input bit. 
Recall that and e / denote the operation counts per input 8-bit byte required for computing the check tuple 
P{X), for the basic CRC and the fast CRC, respectively. We then have = 8tb/{ns) and e/ = 8tf/{ns). 
Using (23) and (24), we have 

g ^ 8<6 ^ 8[ln + l + s{u + ls)] 

ns s 

^^^%^8(/„ + 3 + 2n) ^^^^ 



ns 

eb_ _ tb_ _ In + I + s{u + Is) 
Cf tf Z„ + 3 + 2u 



(27) 



Simple estimates are tb « nsu [by ignoring /„ + 1 and Ig in (23)] and « n2u [by ignoring /„ + 3 in (24)]. 
Substituting these into (27), we have 

Cb tb s h 



ej tf 2 2 

i.e., the fast CRC is approximately h/2 times faster than the basic CRC. 



(28) 



4.2 CRC Complexity Under C Implementation 

Figs. 7 and 8 show the C programs for the basic CRC and the fast CRC, respectively, which are based on 
Fig. 6 (s = h). For illustration, we let s = 16 in the figures, and M{X) = X'^^ + + + 1 (which 
generates the CRC-16) in Fig. 7. However, the following results are also valid for other values of s and other 
generator polynomials. 

We use the following 2 rules to count the number of software operations Appendix A: (Rl) The operation 
count of a program statement is defined as the number of operations, other than the equal sign (=), that 
appear in that statement. (R2) For an if-statement, we average the operation count of the if-statement and 
the operation count of its alternative (e.g., an else-statement). 

The non-zero operation count for each C program statement is recorded between the comment quotes 
(/* */). The programs show that In = h = 2. Using (20) of Remark 3, we have m = 3 if msb(^) = and 
M = 4 if msb(^) = 1. Using rule (R2), we have u = 3.5 (which is the average of 3 and 4), as recorded in 
Figs. 7 and 8. Substituting these values of /„, Is, and u into (25) and (26), we obtain Cb = 8(3 -|- 5.5s)/s and 
6/ = 96/s. Thus, we have 

^ = '(' + '■'^^ = 0.25 + 0.458/^ (29) 
e/ 96 ^ ' 

which is within 10% of (28). For example, let s = /i = 16. Then = 8(3 + 5.5 x 16)/16 = 45.5 and 
6/ = 96/16 = 6. Thus, eft/e/ = 45.5/6 = 7.58, i.e., the fast CRC is 7.58 times "faster" than the basic CRC. 
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Further, if s = /i = 64, then e/ = 1.50, = 44.4, and eb/e/ = 29.6, i.e., the fast 64-bit CRC is 29.6 times 
faster than the basic 64-bit CRC. These results are recorded in Fig. 9. 

We now briefly present the complexity results for s,h G {8, 16, 32, 64}, but without the restriction s = h. 
Prom (37) and (39) of Appendix A, we have 



As an example, consider a basic 16-bit CRC and the fast 16-bit CRC, which arc used to protect an 
input message consisting of 8-bit bytes, i.e., h = 16 and s = 8. From the above formulas, we have Cb = 
8(4 + 5.5 X 8)/8 = 48 and e/ = 80/8 = 10. That is, the basic CRC and the fast CRC use 48 and 10 operations 
per input byte, respectively, to compute their check tuples. Thus, we have ef,/e/ = 48/10 = 4.8, i.e., the 
fast CRC is 4.8 times faster than the basic CRC. The values of e^, ey, and et/e/ for various {h, s) pairs are 
recorded in Fig. 9. The results show that the complexity of the basic CRCs is rather insensitive to the values 
of h and s, namely, ei, varies from 44.4 to 48 (the variation is only 8.1%). In contrast, the complexity of the 
fast CRCs is very sensitive to the values of h and s, namely, e/ varies from 1.50 up to 40.0. 

For a given h, recall from Section 2.1 that we arc free to choose the vahie of ,s. The complexity of the 
basic CRCs is rather insensitive to the choice of s. As seen in Fig. 9, when h G {8, 16, 32, 64}, the complexity 
of the fast CRCs is fairly low when s < h, and is minimized when s = h. When h ^ {8, 16,32,64}, it is 
shown in Appendix A that the complexity of the fast CRCs is minimized (i.e., e/ is minimized) either at 
s = h OT at s = h — 2. 

To summarize, we introduce the new family of CRC generator polynomials that have the explicit form 
Ffi{X) = + X"^ + X + 1, for all ft > 4, as well as the new technique (15) for their implementation. This 
family includes Fs{X), which generates the ATM CRC-8. For this particular CRC, by choosing s = h = 8, 
our new technique provides a new bitwise implementation that is 3.92 times faster than the basic bitwise 
technique (see Fig. 9). 




(30) 




(31) 



unsigned short basic_CRC (int n, unsigned short *Q) 

{ 

int i, j, s; 

unsigned short K, M, P; 
s = 16; 

M = 0x8005; /* M = */ 
K = 0x8000; /• K = z'^"^, h=s=16 */ 



P = 0; 

for Ci=0> i<n; i=i+l) 
{ 

P = P A Q[i]; 



/• 2 •/ 



/* 1 */ 



for (j=0; j<s; j=j+l) /* 2 */ 



{ 

if C CP&K) != ) P = CP«1) A M; /* 3.5 */ 

else P = P«l; 

} 

} 



return P; 
} 



Fig. 7 C program for the basic h-b\t CRC [s = h). 
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unsigned short fast_CRC Cint n, unsigned short *© 

{ 

int i ; 

unsigned short A, C, F, K, P; 


F = 0x7; /* F 


= X^^+X^+X+l •/ 




K = 0x8000; /* K 


= 2^"1, h=s=16 •/ 




P = 0; 

for C'i-=0; i<n; i=i+l) 

{ 

A = P A Q[i] ; 




/• 2 •/ 
/* 1 •/ 


if C CMK) != ) 
else 


C = CA«13 A F; 
C = A«l; 


/* 3.5 •/ 


if C CC&K) != ) 
else 


P = CC«13 A F; 
P = C«l; 


/• 3.5 •/ 


P = P A c A A; 

} 




/• 2 •/ 


return P; 
} 







Fig. 8 C program for the fast h-b\t CRC (s = h). 
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Fig. 9 Software complexity for the basic /i-bit CRCs (e^) and the fast h-b\t CRCs (e/). 

Remark 4. There exist well-known techniques for reducing the operation counts used in CRC implemen- 
tation. An example is the use of table lookup (at the cost of increased memory and cache usage), which 
is presented in Appendix A. Note that, to keep our C programs compact, readable, and general, we ignore 
software optimization techniques (such as loop unrolling) in our C programs. However, these techniques 
certainly can be used to reduce the operation counts in the programs. For example, if loop unrolling is used 
(at the cost of code size expansion) in the inner for-loop of the C program in Fig. 7, then the index increment 
and the end-of-loop test are eliminated, i.e., the loop overhead Ig is reduced from Ig = 2 to Ig = 0. Using 
loop unrolling (i.e., Ig = 0), it can be shown that (30) and (31) reduce to 

_ r8(4-h3.5s)/s iis<h 
\8(3-|-3.5s)/s ifs>h 



{80/s 
100/s 
96/s 
8[12 -I- 3.5(s ■ 



if s < h-1 
ifs = h-l 

if s = /i 
if s > h 



□ 
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4.3 Other Techniques for Error-Detection Codes 

The complexity results for the basic CRC algorithm, which are rather insensitive to the input parameters 
s, h, and the form of the generator polynomial M{X), are shown in Fig. 9. In particular, when h = 16 and 
s = 8, we have ei, = 48 operations per input byte. Our CRC software implementation in C for this case is 
shown Fig. 11 of Appendix A, which is more efficient than the one given in [2, pp. 555-556], which has 63 
operations per input byte according to rules (Rl) and (R2). 

There are other CRC algorithms that are much faster than the basic algorithm. As expected, those 
algorithms are effective for some particular generator polynomials. For example, the clever "add and shift" 
algorithm of [5] is fast for the CRCs generated by Mi{X) = X'^^ + X^^ + X^ + I (for h ^ 32) and 
A'l2{X) — X^'^ + X^^ + X'^ + 1 (for h = 64), which are found by computer search [5]. According to rules (Rl) 
and (R2) for determining the operation counts, these CRCs use 20 operations to process each tuple of s = 32 
input bits (see Fig. 2 in [5]). Thus, these CRCs use 5 operations per input byte. In contrast, from Fig. 9, 
for s = 32, our fast CRCs use only 3 and 2.5 operations per input byte for /i = 32 and h = 64, respectively. 
Thus, our fast CRCs are faster than the above shift-and-add CRCs. Further, our fast 64-bit CRC is even 
much faster when s — 64, because it uses only 1.5 operations per input byte (see Fig. 9). 

As mentioned in Section 1, alternatives to CRCs are checksums. Although checksums are weaker than 
CRCs, they can be substantially faster than CRCs. For example, let s = h and consider the block-pariry 
checksum. The check tuple P{X) of this checksum is simply the sum of all the input tuples, i.e., P{X) = 
127=0 Qii-^)- As shown in Section B.l, the operation count per input byte required for computing P(X) 
of the checksum is e = 24/s. From (31), the fast CRC has e/ = 96/s. Thus, e//e = 96/24 = 4, i.e., the 
checksum is 4 times faster than the fast CRC. 

5 SUMMARY AND EXTENSION 

Error control coding is essential for reliable transmission and storage, and CRCs are known to be effective 
for error detection. In software, an h-hit CRC is typically implemented by dividing the input message into s- 
tuples (i.e., blocks of s bits). The output CRC check bits are obtained by recursively carrying the polynomial 
division on these tuples. 

Thus, the crucial part in CRC computation is the polynomial division on s-tuples. For the basic CRCs, 
this division requires s iterations, which may be expensive for many applications. A common technique for 
reducing the many steps during CRC computation is to use additional memory in the form of table lookup. 
In this paper, we introduce the fast /i-bit CRCs, which are generated by Fh{X) = X^ + X"^ + X + 1, as well 
as the new technique (15) to implement them. Using our fast CRCs, the polynomial division on s-tuples 
requires only max(0, s — h -\- 2) iterations, which are much less than the s iterations required for the basic 
CRCs, as long as s is chosen such that s — h + 2 is much smaller than s. We study the computational 
complexity of the CRCs, which refers to the operation count per input byte required for computing the CRC 
check tuples. Our fast CRCs have low complexity and require no table lookup. For the important case s ^ h, 
the fast h-hit CRCs are approximately h/2 times faster than the basic h-hit CRCs. 

As an illustration, we implement the CRCs in C programming language, and then study their compu- 
tational complexity for the bitwise technique (i.e., without table lookup). We show that the complexity of 
the fast h-hit CRCs varies greatly with s, and is minimized either at s = /i — 2 or at s = /i. In contrast, 
the complexity of the basic h-hit CRCs varies little with s. Because modern computers typically process 
information in bytes or words, we also present the complexity results when s is restricted to multiples of 
byte size and word size. 

In the Appendices, we provide several extensions to the baseline ideas presented in this paper. In particu- 
lar, we present the results for CRC table-lookup techniques, which illustrate tradeoffs between computational 
complexity and memory requirement. We show that when s = h, the fast CRCs can be made 20 percent 
faster by using tables of only 4 entries. We apply our new technique to some weaker CRCs to yield even 
faster CRCs, i.e., there are tradeoffs between speed and capability. Further, we use the new technique to 
construct some fast extended Hamming perfect codes. In particular, we construct h-hit non-CRC codes 
that not only have low complexity but also have the following optimal properties. They have the minimum 
distance d = 4, the burst-error-detecting capability b = h, and the maximum code length 2''"^. We also 
apply the new technique to arbitrary CRCs, and then determine the conditions under which the new tech- 
nique remains effective. In particular, the new technique is substantially faster than the basic technique 
for the CRC-64-ISO generated by X^^ -\r X^ -\- X^ -\- X -\- 1. Using computer search, we obtain CRCs that 
have minimum distance greater than 4 and can be efficiently implemented by the new technique. We also 
obtain CRC weight distributions required for estimating the undetected error probability over binary sym- 
metric channels. Finally, we show how the CRCs algorithms, which are originally designed for sequential 
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implementation on a single processor, can be adapted for parallel implementation on multiple processors. 

APPENDIX A CRC SOFTWARE IMPLEMENTATION AND COMPLEXITY EVALUATION 

The purpose of this appendix is to present software implementation for the CRC algorithms as well as 
to evaluate their computational complexity. Software complexity of an algorithm refers to the number 
of operations (i.e., operation count) used to implement the algorithm. Consider an /i-bit CRC, which is 
generated by a polynomial M (X) of degree h. Our goal is to compute the check /i-tuplc P{X) for an input 
message that consists of n tuples Qo{X),Qi{X), . . . , Qn-i{X). Each tuple Qi{X) has s bits. 

The CRC can be implemented by any of the 4 algorithms shown in Figs. 1-4. Although the value of h is 
fixed, we are free to choose the value of s. Algorithms 1 and 3 are for s < h, whereas Algorithms 2 and 4 
are for s > h. One algorithm can be faster than another, depending the value of s and the form of M{X). 
For example, Remark 2 shows that, for bitwise implementation. Algorithm 4 is faster than Algorithm 2 
when s > h. Thus, we will use Algorithm 4 for bitwise implementation when s > h, as indicated in Fig. 10. 
As stated in Theorem 1, Algorithm 3 must be used for the fast CRCs when s < h. Fig. 10 lists the CRC 
algorithms that are used in our software implementation. 

We recognize that accurate software evaluation is complicated, and requires experiments with differ- 
ent processors, memory organizations, programming languages, and compilers. Other complicating factors 
include programming styles and the extend the CRCs must share with (or compete against) other concur- 
rent /interupting programs. Instead of dealing with these complex issues, which are beyond the scope of this 
paper, we simply use software operation counts for our complexity evaluation. Our technique of software 
comparison is as follows. We write a program (e.g., in C) for each CRC. We then use the operation count as 
the primary measure of complexity, and a CRC is said to be "faster" than another if it has lower operation 
count. 

We now determine the software complexity of the CRC algorithms, which refers to the operation count 
per input message byte required for computing the check /i-tuplc. Let us examine Algorithms 1-4 (shown 
in Figs. 1-4). For each algorithm, the check tuple P{X) is computed by using a loop that computes B{X) 
for n times, where B{X) is given in Definition 1 for the basic CRC and in Theorem 1 for the fast CRC. 
In addition to B[X), we also need to compute all the other terms inside the loop (which include the loop 
overhead). Let r and x be the operation counts required for computing B{X) and the other terms inside the 
loop, respectively. Let y be the operation count required for computing the terms outside the loop. Further, 
for each algorithm, let t be the total operation count required for computing the CRC check tuple from the 
input message that consists of n tuples. We then have t = {x + r)n + y. 

Let e be the operation count per input byte required for computing the check ft-tuple. Each byte has 8 
bits. Because t is the operation count for computing the check /i-tuple from the ns input message bits, we 
have e = 8t/ {ns) = 8[{x + r)n + y]/{ns), i.e., 

^_ 8{x + r) ^ 8y 
s ns 

In the following, we consider h, s, and n to bo independent variables, and our goal is to compute e in terms of 
h, s, and n for both the basic CRCs and the fast CRCs. That is, we can write e = e(s, h, n). To compute e, 
we need to determine r, x, and y, to which we add the subscripts b and / when they refer to the basic CRCs 
and the fast CRCs, respectively. That is, r^, Xb, yb, and refer to the basic CRCs, while r/, Xf, yf, and e/ 
refer to the fast CRCs. 

We present CRC implementation with and without table lookup. Our software programs are for w-bit 
computers that satisfy s < w and h < w (however, we allow the possibility that h + s > w). For example, 
32-bit computers are for s, /i < 32 bits, while 64-bit computers are for s, /i < 64 bits (future 128-bit computers 
arc for s,h < 128 bits). To be specific, we implement the CRC algorithms in C, which is a highly portable 
general-purpose computer programming language (certainly, they can also be implemented in other computer 
languages). We use the following 2 simple rules to count the number of software operations [15]: 

(Rl) The operation count of a program statement is defined as the number of operations, other than the 
equal sign (=), that appear in that statement. 

(R2) For an if-statement, we average the operation count of the if-statement and the operation count of its 
alternative (e.g., an else-statement). 

Let us consider examples on how to use rule (Rl). The statement C = (A<<1)AF will count as 2 
operations (<< and A ). Note that "=" does not count as an operation. Next, consider the statement 
for(i=0; i<n; i=i+l){ }. This implements a (null) loop of n iterations, each iteration has 2 operations 
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(32) 



(< and +). Thus, the total operation count for this loop statement is 2n. The for- loop above is equivalent 
to the while-loop i=0; while (i<n) {i=i+l ; } which, of course, also has 2n operations. 

We now show examples about rule (R2). Suppose that K = 1, and consider the following 2 statements: 

if ((A&K) != 0) C =(A<<1)AF; 

else C = A<<1; 

Here, the if-statement has 4 operations (&, !=, <<, A), and the else-statement has 3 operations (&, !=, <<). 
Thus, the above 2 statements can be considered as a single statement that has 3.5 operations (i.e., the average 
of 4 and 3). 

The above 2 statements are equivalent to the the following 2 statements: 
C = A«l; 

if ((A&K) != 0) C = CAF; 
Here, the first statement has 1 operation, and the second statement has 2.5 operations. Thus, the 2 statements 
together also have 3.5 operations as expected. Note that (A&K)e {0, 1}, because K = 1. Here, for simplicity, 

we assume that (A&K) takes the values and 1 with equal probability of 1/2. Suppose now that K = 3. We 
then have(A&K) G {0, 1, 2, 3}. By assuming that (A&K) takes the values 0, 1, 2, and 3 with equal probability 
of 1/4, the above if-statement (which has 4 operations) is executed with probability 3/4 and the else- 
statement (which has 3 operations) is executed with probability 1/4. Thus, these if-else statements can be 
considered as a single statement that has 4 x 3/4 + 3 x 1/4 = 3.75 operations. 





bitwise 


table lookup 


basic CRC 


Algo. 1 {s < h) 

Algo. 4 (s > ft) 


Algo. 3 {s<h) 
Algo. 2 (s > /i) 


fast CRC 


Algo. 3 {s<h) 
Algo. 4 {s>h) 


Algo. 3 {s<h) 
Algo. 2 {s>h) 



Fig. 10 CRC algorithms used in software implementation. 



Remark 5. Rules (Rl) and (R2) serve as a simple technique for comparing the complexity of different CRCs, 
i.e., they will be used to obtain a first-order estimation of the ratio Cb/ef. These rules are intended only for 
CRC algorithms that are implemented in C, and not for other types of algorithms or other programming 
languages. As seen in the following, our CRC software implementation uses only a small number of elementary 
C operators (namely, <<, >>, =, ——, I—, <, <=, &, and A) and C keywords (namely, char, short, int, 
long, unsigned, if, else, for, while, and return). Our following C programs (using the big-endian convention) 
for the CRCs are written in a style that is intended to be simple and straightforward. See also Remark 6. 

Other techniques for counting operations are also possible. For example, consider rule (Rl')) which is 
defined as rule (Rl) but also counts the equal sign (=) as an operation. Let and e'y denote the resulting 
operation counts under (Rl'). We must have ej, > ei, and > e/. Although the difference between Cf, 
and e'fj (as well as between e/ and e^) can be significant, the difference between the ratios Ch/ej and e^/e^ 
are typically not significant. For example, let s = h = 32. From Fig. 9, we have ei, = 44.8, Cf = 3, 
and Cb/cf = 14.9. Under rule (Rl'), it can be shown that ej, = 61.5, = 4.25, and ej,/e'j = 14.5, i.e., 
e'b/e'f « Sb/sf. Note that rule (Rl), which is used in this paper, is slightly simpler to use than rule (Rl'). 
Thus, oiu- technique for counting software operations is reasonable for the purpose of complexity comparison, 
i.e., we are more interested in the ratio eb/cf, rather than in et, and e/. 

Here, for simplicity, we assign the same unit cost to each operation. A more elaborate technique would 
assign different costs to different operations. However, this assignment depends on many factors (such as 
computer hardware, operating system, processor architecture, and memory organization), which are outside 
the focus of this paper. □ 

Let us now compute x and y in (32). The computation of r is deferred to later subsections. First, 
consider Fig. 11, which shows the C program for bitwise implementation of the basic CRC for the case 
s < h. As indicated in Fig. 10, this program is based on Algorithm 1. In this program, we assume that 
h G {8, 16, 32, 64}, i.e., h is the size (in bits) of one of the natural unsigned types of C: unsigned char, unsigned 
short int, unsigned int, or unsigned long int. The input is the n message s-tuples Q[0], (5[1], . . . , Q[n — 1], 
and the output is the CRC check /i-tuple P. 

We then apply rules (Rl) and (R2) to the program shown in Fig. 11 to obtain the desired operation 
counts. The non-zero operation count for each program statement is recorded between the comment quotes 
(/* */). Recall that the total operation count for computing the check tuple P from the n input tuples is 
tb = {xb + fb)n + yb- Here, rb is the operation count required for computing B{X), which is inside the loop 
indexed by i, Xb is the operation count required for computing all the other terms in the loop besides B{X), 
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and yb is the operation count required for computing all the terms outside the loop. Prom Fig. 11, we have 
= 4 and yb = 0. 

To summarize, for h € {8, 16, 32, 64} and s < h,we have Xb = 4 and yb = 0, which are recorded in Fig. 13. 

For illustration, we let h = 16, .s = 8 and M{X) = X^^' + X^'-^ + X^ + 1 (which generates the well-known 
CRC-16) in Fig. 11. When h ^ {8,16,32,64} and s < h, the computational complexity is slightly higher, 
namely, x;, = 4 and yb = 1. 

Fig. 7 shows the C program for the basic CRC when s = h. Next, consider Fig. 12, which shows the C 
program for the fast CRC when h e {8, 16, 32, 64} and s < h — 1. It can be shown that Xf = 6 and y/ = 
for this case. Again, for illustration, we let h = 16 and s = 8 in Fig 12. Similarly, we can compute the 
values of x and y for all the cases for both the basic CRCs and the fast CRCs. The results are summarized 
in Fig. 13. 

Using Fig. 13, the expression (32) can be simplified as follows. Let z be the ratio of the 2 terms on the 
right-hand side of (32), i.e., 

_ 8{x + r)/s 
^~ (8y)/(ns) 
{x + r)n 

y 

Prom Fig. 13, we have < y < 1 and x > 3, which implies that z > {x + r)n > (3 + r)n > 3n. Thus, 
Sy/{ns) is much smaller than 8(a; + r)/ because we assume in this paper that n is not too small (i.e., we 
assume that n > 4). Thus, the term 8y/{ns) can be dropped from (32). The operation count per input byte 
required for computing the CRC check /i-tuple then simplifies to 

e = '±±^ (33) 
s 

where x is determined from Fig. 13, which depends only on s and h, i.e., x = x{s, h). Recall that r denotes the 
operation count required for computing B{X), which also depends only on s and h [sec (11)], i.e., r = r{s, h). 
It follows from (33) that e now also depends only on s and h, i.e., e = e{s,h). From (33), we also have 

efc _ Xb + n 
Bf Xf + rf 

where Xb and Xf arc given in Fig. 13. In the following, using rules (Rl) and (R2), we compute and r/ for 
both the bitwise and the table-lookup techniques. 



unsigned short basic_CRC Qint n, unsigned char *Q) 

{ 

int i, j, hs, s; 

unsigned short A, B, M, K, P; 
s = 8; 

hs = 8; /* hs = h-s */ 

M = 0x8005; /* M = X^^+X^^+X^+l */ 

K = 0x8000; /* K = 2^"^, h=16 */ 

P = 0; 

for Ci=0; i<n; i=i+l) /* 2 */ 

{ 

P = P A (Q[i] « hs); /* 2 */ 

for Cj=0; j<s; j=j+l) /* 2 */ 

{ 

if ( (P&K) != ) P = CP«1) F; /* 3.5 V 

else P = P«l; 

} 

} 

return P; 
} 



Fig. 11 C program for the basic /i-bit CRC [s < h). 
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unsigned short fast_CRC (int 


n, unsigned char *Q) 


{ 




int i, hs, s; 




unsigned short A, B, P; 




s = 8; 

hs - 8- /* hs - h-s 


h = 16 */ 


P = 0; 




for Ci=0; i<n; i=i+l) 


/* 2 */ 


{ 




A = (P»hs) A Q[i]; 


/* 2 */ 


B = (A«2) A (A«l3 A A; 


/* 4 */ 


P = B A (P«s); 


/* 2 */ 


} 




return P; 




} 





Fig. 12 C program for the fast h-h\t CRC {s < h — 1). 
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Qiih = 8, 16, 32, 64 
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liih^ 8, 16, 32, 64 


Algo. 2 


3 


\i s = h 





(s > h) 


4 


il s> h 





Algo. 3 


6 


if /i = 8, 16, 32, 64 


if /i = 8, 16, 32, 64 


(s < h) 


7 


if /i 8, 16, 32, 64 


1 if /i / 8, 16, 32, 64 


Algo. 4 


3 




if s = /i 


(s > h) 






1 if s > /i 



Fig. 13 Values of x and y. 



A.l CRC Software Implementation: Bitwise Technique (Without Table Lookup) 

According to Fig. 10, the bitwise implementation of the the basic CRCs uses Algorithm 1 for s < h, and 
Algorithm 4 for s > h. From Fig. 13, we then have 



Xb 



1 if s <h 
3 iis>h 



Substituting Xh into (33), we have 



_ , 8(4 + rb)/s iis<h . . 

''^^ 8(3 + rb)/s ifs>h ^ ' 



where r?, denotes the operation count required for computing B[X) of the basic CRCs. 

According to Fig. 10, the bitwise implementation of the the fast CRCs uses Algorithm 3 for s < h, and 
Algorithm 4 for s > h. From Fig. 13, we then have 

re if s < and /i = 8, 16, 32, 64 
Xf = <7 if s < /i, and ft 7^ 8, 16,32,64 
I 3 if s > /i 

Substituting Xf into (33), we have 

f 8(6 + r/)/s if s < h and h = 8, 16, 32, 64 
ef = } S{7 + rf )/s if s < /i and /i ^ 8, 16, 32, 64 (35) 
[ 8(3 + r/)/s jis>h 

where r/ denotes the operation count required for computing B{X) of the fast CRCs. Both Vb and r/ are 
computed in the following subsections. 
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A. 1.1 Basic CRCs 

Recall that B{X) for the basic CRCs is given in Definition 1. First, consider the case s < h, and let us revisit 
Fig. 11. This figure contains the loop (indexed by j) for computing B{X), which is based on Remark 3. The 
figure shows that the operation count required for computing B{X) is ri, = 5.5s. Next, for the case s > h, 
it can also be shown that = 5.5s (see Fig. 7). To summarize, we have 

ri, = 5.5s (36) 

Substitute (36) into (34) we have 

_ / 8(4 + 5.5s)/s iis<h , . 

\8(3 + 5.5s)/s if s>/i ^ ' 

Note that (36) is derived from the C programs that do not use loop unrolling (which is also the case for the 
C programs presented in [2]). If loop unrolling is used, (36) reduces to = 3.5s. 

Here, our software implementations of the basic CRCs are general, i.e., they arc applicable to all generator 
polynomials M{X) and to a wide range of processor architectures. For some specific generator polynomials 
that have some desirable properties, alternative implementations (such as shift and add [5], and on the fly 
[17, 18]) may have lower complexity. Thus, we concentrate on the general nature of the algorithms rather 
than attempting to deal with specific types of generator polynomials. Also, for our C programs, we are more 
concerned with their readability and less concerned with optimization techniques such as loop unrolling and 
use of register variables (see Remark 4). 



A.1.2 Fast CRCs 

Recall that B{X) of the fast CRCs is given in Theorem 1. First, assume that s < /i — 1. The C program for 
this case is shown in Fig. 12, which contains the procedure for computing B{X). Applying rules (RI) and 
(R2) to Fig. 12, we observe that the operation count required for computing B{X) is rj = 4. Next, assume 
that s = h. The C program for this case is shown in Fig. 8, which yields rj = 9. The C programs for all the 
other cases can also be written, and the resulting software complexity can also be determined. Following is 
the list of the operation counts for all the cases: 



Substituting (38) into (35), we have 



4 

6.5 
9 

I 9 + 5.5(s 



if s < /i ■ 

if s = /i • 



80/s 
88/s 
100/s 
108/s 
96/s 
I 8[12 + 5.5(s- 



h)]/s 





if 


s = h 






-h) 


if 


s> h 






if s < 


h- 


- 1 and h 


= i 


,16,32,64 


if s < 


h- 


- 1 and h 




,16,32,64 


if s = 


h- 


- 1 and h 


= i 


,16,32,64 


if s = 


h- 


■ 1 and h 




,16,32,64 


if s = 


h 








if s > 


h 









(38) 



(39) 



The operation count per input byte ey for the fast h-hit CRC given in (39) is a function of s, which is the 
size of each input tuple Qi{X). We now determine the value of s that minimizes e/. These optimal values 
are denoted by s* and e 



First, assume that h G {8, 16, 32, 64}. For each h € {8, 16, 32, 64}, we can search for an s e {1, 2, . . . , 64} 
such that 6/ in (39) is minimized. Our search shows that 



r8o/(/i- 

""f'X 96/ h 



2) 



iih 
if /i 



16,32,64 



(40) 



which is achieved when 



h- 
h 



iih 
iih 



16,32,64 



(41) 



Next, assume that h ^ {8,16,32,64}. For each h G {8,16,32,64}, 4 < /i < 64, we can search for an 
s G {1, 2, . . . , 64} such that e/ in (39) is minimized. Our search shows that 
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et = 



_ j88/(/i-2) 



which is achieved when 



\96/h 

{ 



if /i > 24 
if 4 < /i < 24 



h-2 
h 



if /i > 24 
if 4 < /i < 24 



(42) 



(43) 



Thus, (41) and (43) show that the complexity of the fast /i-bit CRCs is minimized (i.e., e/ is minimized) 
at either s = h or s = h — 2, whore s is the; imnibcir of bits in each input tuple Qi{X). 

For example, by letting h = 16, the optimal size for each input tuple Qi{X) is s* = /i — 2 = 14 [by (41)], 
and the corresponding minimum operation count is = 80/(/i — 2) = 80/14 = 5.71 [by (40)]. Information 
on computers is typically organized in bytes or words. Thus, it is of interest to determine the optimal value 
of e/ when s is restricted to a multiple of byte size and word size, i.e., when s is a multiple of 8, 16, 32, 64. 
These optimal values, which are obtained from (39), are shown in Fig. 14. 

Recall that e/ = e/(s,/i), i.e., e/ is a function of s and h. In Fig. 14, for a given h, s(°p*) denotes the 
value of s, 1 < s < 64, that minimizes e^(s, h), and the corresponding minimum e^(s, h) is denoted by 6^°''*'. 

Thus, we have 6^°^*^ = e/(s(°P*\/i) < ef(s,h) for all 1 < s < 64. Similarly, sC^y*^) denotes the value of s e 
{8,16,24,32,40,48,56,64} that minimizes ef{s,h), and the corresponding minimum ef{s,h) is denoted by 
g(byte) pjjjg^jjy^ g(word) (jgnotes the valuc of s e {8, 16, 32, 64} that minimizes e/(s, h), and the corresponding 



minimum 



ef{s,h) is denoted by Cj 



(word) 



For example, by letting h = 64, we have s 



(opt) 



62, e 



(opt) 



„(byte) 



= 56, ef''"''' = 1.43, s(--d) = 64, e^f°"''> = 1.50. In general, we must have ef'"'> < ef""'"^ < 



= 1.29, 

(word) 



h 


g(opt) 


s(bytc) 


^{word} 




„(opt) 


„(bytc) 
^.f 


(word) 


4 


4 


8 


8 




24.0 


34.0 


34.0 


6 


6 


8 


8 




16.0 


23.0 


23.0 


8 


8 


8 


8 




12.0 


12.0 


12.0 


10 


10 


8 


8 




9.60 


11.0 


11.0 


12 


12 


8 


8 




8.00 


11.0 


11.0 


16 


14 


16 


16 




5.71 


6.00 


6.00 


20 


20 


16 


16 




4.80 


5.50 


5.50 


24 


22 


24 


16 




4.00 


4.00 


5.50 


32 


30 


32 


32 




2.67 


3.00 


3.00 


40 


38 


40 


32 




2.32 


2.40 


2.75 


48 


16 


•18 


32 




1.91 


2.00 


2.7.'') 


56 


54 


56 


32 




1.63 


1.71 


2.75 


64 


62 


56 


64 




1.29 


1.43 


1.50 



Fig. 14 The optimal values of s and e/ for the h-h\t fast CRCs 
(s(°P*) = best of s G {1, 2, . . . , 63, 64}, s^^^*'"^ = best of s G {8, 16, 24, 32, 40, 48, 56, 64}, 



s(word) ^ ijgg^ ^fg^ 54}) 



Remark 6. Our C programs for the CRCs, which follow directly from the pseudocodes in Figs. 1-4, are 
written in a style that is intended to be simple and straightforward. For readability, we use an array (instead 
of a pointer) for the input s-tuples Qi. We also avoid using any C syntax that obscures the operation 
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counts. For example, the more explicit syntax if ((P&K) !=0) is used instead of the shorthand if (P&K). 
Although these 2 expressions are equivalent, the former shows 2 operations more clearly. If desired, these C 
programs can be rewritten in pointer and shorthand style, for example, as shown in Figs. 15 and 16, which 
are equivalent to Figs. 7 and 8, respectively. □ 



#define M 0x8005 /* M = */ 


#define K 0x8000 /* K = 


h=s=16 */ 


#define s 16 




unsigned short basic_CRC (int n, 


unsigned short *Q) 


{ 




register int j, s; 




register unsigned short K, M, P, 


*Qi, *Qn; 


Qi = Q; 




Qn = Q + n; 


/* 1 */ 


P = 0; 




while (Qi < Qn) 


/* 1 */ 


{ 




P A= *Qi++; 


/* 2 */ 


for Cj=0; j<s; 


/* 2 */ 


{ 




if CP&K) P = CP«1) ^ M; 


/* 3.5 */ 


else P = P«l; 




} 




} 




return P; 




} 





Fig. 15 C program for the basic /i-bit CRC in pointer style (s = h). 



#define F 0x7 /• F = X^^ 


+X^+X+1 */ 


#define K 0x8000 /* K = 2^" 


^, h=s=16 */ 


#define s 16 




unsigned short fast_CRC (in* 


unsigned short *Q) 


{ 




register unsigned short A, C, F 


, K, P, *Qi, *Qn; 


Qi = Q; 




Qn = Q + n; 


/* 1 */ 


P = 0; 




while CQi < Qn) 


/* 1 */ 


{ 




A = P A *Qi++; 


/• Z */ 


if CA&K) C = CA«1) A F; 


/* 3.5 V 


else C = A«l; 




if CC&K) P = CC«1) A F; 


/• 3.5 V 


else P = C«l; 




P A= c A A; 


/* 2 V 


} 




return P; 









Fig. 16 C program for the fast /i-bit CRC in pointer style (s = h). 

Remark 7. When s is small, we can compute B{X) = Rm(x) , where degiee{A{X)) < s, using a 



series of if-else statements as follows. For example, suppose that s = 
Then A{X) e {0, 1,X,X + 1}, and it can be shown that 

ifA(X) 



2 and M{X) = Fh{X) = X^+X^+X+l. 



B{X) 





X'^+X + l if^(X) = l 
X^+X^ + X iiA{X)^X 
VX^ + 1 if aIx) =X + 1 
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Note that polynomials can also be represented as integer numbers, e.g., the polynomial X'^ + + X is 
equivalent to the decimal number 14. Thus, B{X) can be computed using the C program segment shown in 
Fig. 17. Applying rules (Rl) and (R2) to this C program segment, the operation count for computing B{X) 
is 1, 2, 3, or 3 if A{X) is 0, 1, 2, or 3 (in integer representation), respectively. We now assume that the bits 
and 1 of the input message occur equally likely. Thus, A{X) assumes one of the values 0, 1, 2, 3 with equal 
probability of 1/4. Then, on the average, the operation count for computing B(X) is (1 + 2 + 3 + 3)/4 = 
2.25. In general, this technique for computing B{X) can be applied to any generator polynomial M{X). 

Let k denote the operation count required for computing B{X) = Rm{x) [A{X)X'^'\ using this if-else 
technique, where degree(^(X)) < s. Note that k depends on s, i.e., k — k{s). As shown above, we then 
have k{2) = 2.25, which is smaller than both and r/ given in (36) and (38). In general, it can be shown 
that k{s) = 2*~^ + 2~^ — 2~* for s > 1. In particular, fc(l) = 1. Thus, this if-else technique is effective for 
small s, such as s = 1, 2, or 3. However, in this paper, we are mainly concerned with the case s > 8, which 
is more commonly used in practice. For this case, k{s) is much greater than both rf, and r/. Thus, when 
s > 8, the if-else technique is much more expensive than the basic and the new techniques, and it will not 
be discussed further in this paper. Note also that this if-else technique is different from the table-lookup 
technique (which will be discussed later). 



if (A = 0) B = 0; /* 2.25 V 

else 
{ 

if (A = 1) B = 7; 
else 
{ 

if (A = 2) B = 14; 
else B = 9; 

} 

} 



Fig. 17 C program segment for computing B{X) = Rj'^(x) [^(^)^^] when s = 2. 

□ 

Remark 8. Consider an input message U{X), which is protected by an h-hit CRC. Recall that we 
implement this CRC by first dividing the input message into n s-tuples Qi{X), i.e., we have U{X) = 
{Qo{X), Qi(X), . . . , Qn-i{X)). These s-tuples Qi{X) then become the input to one of the CRC algorithms. 
Fig. 9 shows that the complexity of the basic CRCs is rather insensitive to the values of .s, whereas the 
complexity of the fast CRCs is very sensitive to the values of s. Recall that the operation count per input 
byte e/ in (39) is a function of s and h, i.e., e/ = ef{s,h). For example, Fig. 9 shows that e/(8, 16) = 10 
and e/(16, 16) = 6, i.e., e/(16, 16) < e/(8, 16). 

So far, we do not address the cost of obtaining the tuples Qi{X). We now address the impact of this cost 
by considering the fast 16-bit CRC, i.e., h = 16. Suppose that the input message U{X) originally consists 
of m bytes, m > 4, denoted by Io{X), Ii{X), . . . , Im-i{X). Each Ii{X) is an 8-tuple. Thus, we need to 
organize the bytes Ij{X) into the s-tuples Qi{X). One technique is to simply set Qi{X) = Ii{X), i.e., each 
Qi{X) is an 8-tuple. Let e be the operation count per input byte required for CRC encoding. We then have 
s = 8, and hence e = e/(8, 16) = 10. 

An alternative technique is first to pair 2 adjacent input bytes to form 16-bit tuples from which the check 
bits are then computed. More precisely, we now let s = 16 and define the new 16-tuples Qi{X) by 

n (Y\- f i^o{X),Ii{X)) if m is even 

W0{^)-<^ (0,/o(X)) if TO is odd 

and 



n (Y\ — / i^'^iiX),l2i+iiX)) if TO is even 

' - \ (72,_i(X),72i(X)) if TO is odd 



for 

(to — 2)/2 if TO is even 



0<i< 



(to — l)/2 if TO is odd 



The algorithm for pairing the bytes and then computing the fast 16-bit CRC is shown in Fig. 18. Using 
this algorithm, it can be shown that the operation count per input byte is e = 7.5, which is lower than 
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6/ (8, 16) = 10 of the non-pairing technique. Note that e = 7.5 > e/(16, 16) = 6, because of the additional 
cost for pairing the input bytes to form the new 16-bit tuples to be used for the CRC computation. 



1 


if (m is even) 


9 


— + J-1, * — ^, / 


Q 
O 


else 


A 

4 


{Q — lo, I — 1, 1 


O 




6 


while (i < m — 1) 


7 


{ 


8 


Q = /,X*^; i = i + l; 


9 


Q = Q + Ii; i = i + l; 


10 


A = B + Q; 


11 


B = Rf,, [A{X''+X + 1)]; 


12 


} 


13 


P = B; 


14 


return P; 



Fig. 18 Algorithm for computing the fast 16-bit CRC directly from the m input bytes 



A. 2 CRC Software Implementation: Table-Lookup Technique 

Recall that the complexity of the fast CRCs is low even without using table lookup. With table lookup, the 
operation count is reduced at the cost of additional memory resource. Although our focus in this paper is 
on bitwise algorithms, we now also present table-lookup algorithms to illustrate tradeoffs between operation 
count and table size. Our formulation and results here are straightforward generalizations or variations of 
well-known results, which arc available in [2. 5, 11, 17, 18, 20]. Note that, with table lookup, speed directly 
correlates with operation coimt imdcr ideal conditions (e.g., the table is stored in the fastest cache, and there 
is no cache miss). Otherwise, speed may not correlate directly with operation count (e.g., when the impact 
of cache miss is not negligible [5]). 

For table-lookup implementation, according to Fig. 10, we use Algorithm 3 (when s < h) and Algorithm 2 
(when s > h) for both the basic CRCs and the fast CRCs. From (11), we then have 

B{X) = Rm(x) [A{X)X''] 

where degrec(A(Ar)) < s. In the following, B{X) is computed by table lookup. Let gb and gf be the total 
number of table entries for the basic CRCs and the fast CRCs, respectively. 

A.2.1 Basic CRCs 

According to Fig. 10, Algorithms 2 and 3 are used for the basic CRCs. Substituting the values of x from 
Fig. 13 for Algorithms 2 and 3 into (33), we have 

S{6 + rb)/s if s < /i and /i = 8, 16, 32, 64 
_ . 8(7 rb)/s if s <h and h ^ S, 16, 32, 64 , . 

^ 8(3 + rb)/s iis = h ^ ^ 

8{4 + rb)/s iis>h 

where is the operation count required for computing B{X) via table lookup. The required tables are 
defined below. First, we write 

S = ti+t2-\ htm 

for some m and ti such that 1 < m < s and 1 <ti < s (i = 1, 2, . . . , m). Next, we decompose A{X) into m 
polynomials Ai{X), A2{X), . . . , Am-i{X) , Am{X) such that 

A{X) = Ai{X)X^*^+^^+-+*'^^ + A2(X)X(*3+**+-+*-) + ■■■ + Am-iiX)X*'^ + Am{X) 

m — 1 



^ ^i(X)X(*^+i+-+*'") + Am{X) 



i=l 
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with degree(Aj(X)) < ti, i = 1,2, ... ,m. We then have 
B{X) = Rm(x) [A{X)X''] 



= R 



M(X) 



m—1 

i=l 

m—1 



M(J(:) 



(ti + i + ---+tm + h) 



(45) 



where the tables Tj [ ] are defined by 



< i < m 
m 



(46) 



where Ai denotes the ii-tuple that is composed of the binary coefficients of Ai{X). For example, if = 4 
and Ai{X) = X"^ + 1, then Ai = (0101), which is equivalent to the decimal integer 5. 

Thus, regardless of whether s < h or s > h, the table Ti[ ] has 2** entries, each entry is an ft-tuple. (For 
example, let h = 16 and ti = 8. The table Ti[ ] then has 2^ entries, 16 bits each, i.e., the total memory 
storage for this particular table is 2* x 16 bits = 512 bytes). Finally, the total number of entries for the m 
tables, denoted by gb, is 

m—1 m 

56= ^2*- +2*- =^2*^ (47) 

1=1 i=l 

To summarize, for a given polynomial A{X) of degree less than s, let m,ti,t2, .■.,tm be such that s = 
The term 

B{X) = Rm(x) [A{X)X''] 

can then be computed using the ni tables defined by (46). The total number of entries for these tables is 
gb = YliLi 2''- Further, regardless of whether s < ft or s > /i, it can be shown that, using the m tables, the 
number of operations required for computing B{X) is 

n = 3(m - 1) (48) 

We now consider the special case t\ =t2 = ■ ■ ■ = tm = s/m. The m tables defined in (46) then becomes 

= R, 



Ai(X)X''+^(™-')/'" 

i = 1, 2, . . . , m. Each of the m tables has 2**/™ entries. From (47), the total number of table entries is 

m 

gb = Y^ 2^/™ = 7712^/™ (49) 

i=l 

Equations (48) and (49) show tradeoffs between the operation count Tb and the table size Qb- That is, to 
decrease the table size, we must increase m in = m2^/"^, and this in turn will increase the operation 
count Tb = 3(m — 1). Thus, smaller (larger) table size gb will yield larger (smaller) operation count rb. In 
particular, when m = 1, we have = and gb = 2*. When m = s, we have rb = 3(s — 1) and gb = 2s. 
Substituting rfe = 3(m — 1) into (44), we have 



66 



{(24to + 24)/s if s < ft and ft = 8,16,32,64 

(24m + 32) /.s if s < ft and ft ^ 8, 16, 32, 64 

24m/s if s = ft 

(24m + 8)/s ifs>ft 



(50) 
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Note that our formulation is a straightforward gcncrahzation of [18], which contains the results for the special 
cases h = 16, s & {8, 16}, and m € {1, s}. Our results [e.g., (49)] also resemble those of [11], which presents 
in-depth studies of the case h = 32. 

Note that the function f{x) = 2^ is convex. Given s = J^^i from Jensen's inequality, it can then be 
shown that J^iLi 2*' > 2*/™, i.e., X)™ i 2*' > mT'l™ . This implies that, for a given m, the table size <?(, 
in (47) is minimized when t\ = ti = ■ ■ ■ = tm- Thus, we focus on only this special case (tj = s/m) in this 
paper. 

For example, let /i = 16, s = 8, and rn = 1. That is, we use a basic 16-bit CRC to protect a message 
consisting of input bytes (s = 8). This CRC is implemented using one lookup table (m = 1), which has 
g\, = ntl"!"^ = 2® entries. Using (50) with s < h, we have = (24to -|- 24)/s = 6. That is, 6 operations are 
required for computing the check tuple per input byte. These results are recorded in the first row of Fig. 19. 
Note that, because each table entry has h ^ 16 bits, the total storage is hgi, = 16 x 2® bits = 512 bytes 
(which is not shown in the figure). The results for other values of h, s, and m arc shown in Fig. 19. 

From Fig. 19, we observe the followings. First, the results for the cases h = 8 and h = 16 arc identical 
for s = 32,64, i.e., they differs only for s = 8, 16. This follows directly from (50). Similarly, the results for 
the cases h = 16 and h = 32 are identical for s = 8,64, i.e., they differs only for s = 16,32. Although the 
number of table entries gi, = m2*/™ depends on only s and m, the total storage is hgt = /im2*/™, which 
also depends on h. 

Recall from Fig. 9 that the complexity results for bitwise implementation of the basic CRCs vary little 
over a wide range of s values. In contrast, as seen in Fig. 19, those for table-lookup implementation vary 
greatly with s. These results can also be used to optimize CRC table-lookup implementation. For example, 
suppose that h = 16. Let us compare the 2 cases: (.s = 8,m = 1) and (s = 16, m = 4) in Fig. 19. In 
both cases, the required operation count per input byte is = 6. However, the first case requires one 
table of 2^ entries (= 16 x 2* = 512 bytes), while the second case requires 4 tables totaling only 2^ entries 
(= 16 X 2^ = 128 bytes), which is 75% less than the first case. More generally. Fig. 19 shows that, for a 
given Cfc, the total number of table entries gb is minimized when s = h. 







h = 8 


/i = 16 


h = 32 


h = 64 






m 


66 


66 


66 


66 


9b 


s = 8 


1 


3 


6 


6 


6 


28 




2 


6 


9 


9 


9 


25 




4 


12 


15 


15 


15 


24 


s = 16 


1 


2 


1.5 


3 


3 


2i6 




2 


3.5 


3 


4.5 


4.5 


29 




4 


6.5 


6 


7.5 


7.5 


26 




8 


12.5 


12 


13.5 


13.5 


25 


s = 32 


1 


1 


1 


0.75 


1.5 


232 




2 


1.75 


1.75 


1.5 


2.25 


217 




4 


3.25 


3.25 


3 


3.75 


2lO 




8 


6.25 


6.25 


6 


6.75 


27 




16 


12.25 


12.25 


12 


12.75 


26 


s = 64 


1 


0.5 


0.5 


0.5 


0.375 


264 




2 


0.875 


0.875 


0.875 


0.75 


233 




4 


1.625 


1.625 


1.625 


1.5 


2l8 




8 


3.125 


3.125 


3.125 


3 


2" 




16 


6.125 


6.125 


6.125 


6 


2* 




32 


12.125 


12.125 


12.125 


12 


27 



Fig. 19 Complexity results for table-lookup technique for the basic h-b\t CRCs 
(Cfc = operation count per input byte, gb = total number of entries from m tables). 

Remark 9. Both gb and eb depend on s, h, and m, i.e., we can write gb = g&(s, h, m) and Cb = eb{s, h, m). 
Consider the 2 special cases: m = s/2 and m = s. From (49) and (50), it can be shown that gb{s, h, s/2) = 
gh{s,h,s) = 2s and eb{s,h,s/2) < eb{s,h,s). That is, these 2 cases yield the same table size, but the case 
TO = s/2 always yields lower operation count than the case to = s. Thus, the case to = s can be eliminated 
from our discussion. □ 

Remark 10. So far, B{X) is computed by either the bitwise technique or the table-lookup technique. 
However, B{X) can also be computed using both techniques as follows. Recall from (45) that B{X) is the 
sum of TO terms. Suppose that we now use tables to compute the first to — 1 terms, and use no tables to 
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compute the last term. More precisely, from (45), we have 



B{X) = J2 Rm(x) +Rm(x) [Am{X)X' 

m—1 



where the TO - 1 tables ] are defined by T,[A.,] = Rm{x) . !<-■■< rn. Assume 

now that Rm{x) [A.m{X)X'^'\ is computed without using tables. Thus, B{X) can be computed using the 2 
techniques at the same time: the table lookup technique (with the to — 1 tables Ti[ ]) and the bitwise 
technique (for computing Rm{X) [Am{X)X^'^ without using tables). In the following, this mixed technique 
is applied to the fast CRCs to yield small table size when s ^ h. □ 

A.2.2 Fast CRCs 

Recall from Section 2.1 that, when implementing an /i-bit CRC, we are free to choose the value of s, which is 
the size of each input tuple Qi{X). That is, we can choose s < h, s — h, or ,s > h. Fig. 9 shows that, under 
bitwise implementation, the fast CRCs are much faster than the basic CRCs for s < h, in the sense that e/ 
is much smaller than Cf,. Further, by comparing Fig. 9 with Fig. 19, we see that the bitwise implementation 
of the fast CRCs (i.e., gf = 0) is even faster than the table- lookup implementation of the basic CRCs (i.e., 
gb > 0) in many cases. For example, consider the case s = h = 32. Fig. 19 shows that Cb — Q when gb = 2'' 
(at TO = 8), and Cb = 12 when gb = 2^ (at to = 16). On the other hand. Fig. 9 shows that e/ = 3 when 
gf = 0. The same figures also show that, although the fast CRC requires no table lookup (i.e., gf = 0) and 
the basic CRC requires a table of gb = 2^° entries (at to = 4), both CRCs have the same operation count 
Cf = Cb = 3. 

Recall from (41) and (43) that, under bitwise implementation, e/ is minimized either at s = /i — 2 or at 
s = h. Thus, by choosing s to be at (or near) these optimal values, the fast CRCs require no table lookup 
(i.e., gf = 0) and still have low operation count (i.e., is small). 

We now discuss table-lookup techniques for the fast CRCs generated by Fh{X) = X^ + X"^ + X + 1. 
An obvious technique is to apply the table-lookup technique in Section A. 2.1 for the basic CRCs to the fast 
CRCs by simply letting M{X) = Ffi(X). The required total number of table entries is then given by (49), 
i.e., gf = gb — to2''/™. In the following, for the case s > h, we present another table- lookup technique (which 
is similar to the mixed technique in Remark 10 with to = 2) that exploits the special structure of Fh{X) to 
yield gf = 2*"'^+^, which is small when s « /i , e.g., gf = i when s = h. 

Recall that r/ denotes the operation count required for computing B{X). Without using tables (i.e., 
when gf = 0), r/ is given by (38), i.e., r/ = 9 + 5.5(s — h) for s > h. We show below that r/ is slightly 
reduced by using a small lookup table. 

Assume that s > h. According to Fig. 10, we use Algorithm 2 to implement the table-lookup technique 
for the fast CRCs when s > h. From Fig. 13, we then have 

_ r 3 if s = /i 
l4 ifs>/i 



which is inserted in (33) to yield 



r8(3 + r/)/. if s = h 
~ \ 8(4 + r/)/s ifs>/i ^ ^ 



where r/, which denotes the operation count required for computing B{X) via table lookup, is determined 

in the following. 

First, we decompose A{X) into ^i(^) and A2{X) such that 



A{X) = Ai{X)X''-^ + A2{X) 
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where degree(Ai(X)) < s - 
B{X) 



/i + 2 and degree(A2(X)) <h-2. Using (11) with M{X) = Fh{X), we have 



R 



[A{X)X^ 



= Rf,(x) [A{X){X^ + Fn{X))\ 
= RF,ix) [A{X){X^ + X + l)\ 

= Rf„(x) [^(X)X''-2(x2 + X + 1)] +A2{X){X''+X + l) 
= Tf[A^\+A2{X){X^ +X + 1) 

where T/ [ ] is the table defined by 

Tf[A^] = Rp.,(^) [A^{X)X^-\X^ + X + l)\ 



(52) 



(53) 



where Ai denotes the {s — h + 2)-tuple that is composed of the binary coefficients of the polynomial A\{X) 

of degree less than ,s — h + 2. The table T/[ ] has gj = 2^^'*+^ entries, and each entry contains h bits. Using 
this table, it can be shown that the operation count required for computing B{X) is r/ = 7. To summarize, 
when s > h, we have 

7, .9/ = 2"-''+^ (54) 



Substituting (54) into (51), we then have the following operation count per input byte and the table size 
for the case s > h: 



80/s, 
88/s, 



5/ = 4 if s = h 

Qf = 2«-''+2 if s > /i 



(55) 



i.e., the table size gf grows exponentially with the difference s — h. Thus, this table- lookup technique is not 
recommended for large s — h. To have a small table, we must choose s that is sufficiently close to h. The 
table size is minimized when s = h, which yields (// = 4, i.e., the fast /i-bit CRCs now require e/ = 80 /h 
operations per input byte and a small table of only gj = A entries. This is 20% lower than the bitwise 
technique (i.e., = 0) that requires e/ = 96//i operations per input byte. Fig. 20 shows the numerical 
values of (55) for s, /i G {8, 16, 32, 64}, which vary greatly with h and s. In particular, the table size is large 
(s/ > 2^°) when s > h, but is very small {gf = 4) when s = h. 

For the special case s = h (i.e., gf =4), it can be shown that the 4 entries of the table (53) are given by 







Tf[l] = X^-'^ + X^-'^ + X'^ + X + l 
Tf[2] = X^-'^ + X^ + 1 
T/[3] = X^-'^ + X^ + X'^ + X 



(56) 



These entries in hexadecimal are shown in Fig. 21. 

The table-lookup algorithm for the fast CRC (when s > h) is given in Fig. 22, where the 2''^''+^ entries 
of the table Tf [ ] defined by (53) are stored in the top part of the algorithm. The C program for the special 
case s = h = 16, which is based on Fig. 22, is given in Fig. 23. 





s 


e/ 


9f 


h = 8 


8 


10 


2' 




16 


5.5 


2io 




32 


2.75 


226 




64 


1.375 


258 


h=16 


16 


5 


2' 




32 


2.75 


2l8 




64 


1.375 


250 


h = 32 


32 


2.5 


2' 




64 


1.375 


234 


h = 64 


64 


1.25 


2' 



Fig. 20 Complexity results for the fast h-b\t CRCs with table lookup, s > h 
(e/ = operation count per input byte, gf = total number of table entries). 
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h 


Tf[0] 


Tf[l] 


Tf[2] 


Tf[3] 


8 





c7 


89 


4e 


16 





c007 


8009 


400c 


32 





C0000007 


80000009 


4000000e 


64 





C000000000000007 


8000000000000009 


400000000000000e 



Fig. 21 Four-entry tables for the fast h-h\t CRCs generated by Fh{X) = + X"^ + X + 1 {s = h). 



1 


store r/[0], . . . 




2 


B = 0; 




3 


for (0 < i < n 




4 


{ 




5 


A = BX'- 




6 


Ai = {s- 


ft + 2) left-hand bits of A; 


7 


A2 = {h- 


2) right-hand bits of A; 


8 


B = Tf[Ai 


] + A2{X''+X + l)- 


9 


} 




10 


P = B- 




11 


return P; 





Fig. 22 Table-lookup algorithm for the fast h-b\t CRC generated by Fh{X) = X^ + X"^ + X -\- 1 {s > h). 



unsigned short 






fast_CRC_table (\nt n, unsigned short *Q) 


{ 






int i; 






unsigned short A, Al, A2, P; 






static unsigned short T[4] = 






{0x0, 0XC007, 0x8009, 0x400e}; 






P = 0; 

for Ci=0; i<n; i=i+l) /* 


2 


*/ 


{ 






A = P A Q[i]; /* 


1 


•/ 


Al = A » 14; /• 


1 


•/ 


A2 = A & 0x3fff ; /* 


1 


*/ 


P = T[A1]ACA2«2)ACA2«1)AA2; /* 


5 


*/ 


} 






return P; 






} 







Fig. 23 C program with table lookup for the fast 16-bit CRC generated by Fi6(X) = X^^ + X'^ + X + \ {s = h = 16). 



Remark 11. Based on the suggestion by Y. Sawada, the C program in Fig. 23 can be improved to yield 
the C program shown in Fig. 24. This improvement follows from the observation that Al and A2 are the left 
and right parts of A, respectively, i.e., A2 can be determined from Al and A. Thus, by modifying the table 
T[A1] as shown in Fig. 24, we can replace A2 by A, i.e., A2 is now no longer needed. 

Using the above improvement, it can be shown in general that the table given in (56) can now be simplified 
to become the new table defined by 

T/[0]=0 

Tffll =X^ + X + 1 

, (57) 

Tf[2]=X^ + l 

T/[3] =X^ +X'^ +X 
i.e., T/[0] = 0, T/[l] = 0x7, T/[2] = 0a;9, T/[3] = Qxe in hexadecimal. 
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unsigned short 

fast_CRC_table (int n, unsigned short *Q) 

{ 

int i ; 

unsigned short A, Al, P; 

static char T[4] = {0x0, 0x7, 0x9, 0xe}; 

P = 0; 

for Ci=0; i<n; i=i+l) /* 2 */ 

{ 

A = P A Q[i]; /* 1 */ 

Al = A » 14; /* 1 */ 

P = T[A1]aCA«2)a(;a«1)aA; /* 5 */ 
} 

return P; 

} 



Fig. 24 Improved C program with table lookup for the fast 16-bit CRC generated 

by Fie{X) =X^^+X^+X + l{s^h^ 16). 

□ 



APPENDIX B OTHER FAST ERROR-DETECTION CODES 

So far, wc apply the new teehnique (15) to Fh{X) ^ X^ + X'^ + X + 1 to yield the fast h-hit CRCs. We 
now apply this same technique to binomials and trinomials to yield even faster (but weaker) CRCs. We then 
construct some non-CRC error-detection codes, which are not only fast but also have optimal guaranteed 
error-detecting capability. 

Recall from Section 3.3 that the maximum length of an error-detection code is defined to be the total bit 
length at or below which its minimum distance is d > 3, i.e., beyond which its minimum distance will reduce 
to d = 2. Theorem 2 shows that the maximum length of a CRC is the period of its generator polynomial. 
In the following, Tib denotes the total bit length of a code. 



B.l Fast CRCs Generated by Binomials 

Consider the /i-bit CRC generated by the binomial A'I{X) = X'^ + 1, which has period h. To avoid triviality, 
we assume that this CRC includes at least one input bit, i.e., nb > h. From Theorem 2, this CRC then has 
the minimum distance d = 2, i.e., it is a weak code for error detection. This CRC can be implemented via 
Fig. 3 (for s < h) or Fig. 4 (for s > h). Applying the new technique (15) to M{X) = X'' + 1, the term B{X) 
in these figures is given by 

^y^'-Sn [A{x)x-'-''] ii.s>h 



A{X), i.e., the polynomial 



where N{X) = {X^ + 1)X''-^. Note that by choosing s < h, we have B{X) 
division is eliminated. 

Suppose now that s ^ h. The CRC generated by X'' -t- 1 can then be implemented by Fig. 6 with 
B{X) = A{X). Fig. 6 can be further simplified to yield the following pseudocode for computing the check 
/i-tuple P{X): 




which yields 



P{X) = J2Q^iX) 



i=0 



i.e., the CRC generated by M(X) = X^ + 1 is identical to the bock-parity checksum [5]. From the above 
pseudocode, it can be shown that computing the check tuple P{X) for this checksum requires e = 24/s 
operations per input byte. Recall from Section 4 that = 8(3 -|- 5.5s)/s and e/ = 96/s. We then have 
ef/e = 96/24 = 4 and Cb/e = 8(3 + 5.5s)/24 = (3 + 5.5s)/3. 
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For example, if s = h — 16, then computing the cheek tuple P{X) for the 16-bit bock-parity checksum 
requires e = 24/16 = 1.5 operations per input byte. We then have e//e = 4 and et/e = (3 -|- 5.5 x 16)/3 = 
30.33. Thus, as expected, the bock-parity checksum (which has minimum distance = 2) is substantially 
faster than the fast and basic CRCs (both of which have minimum distance = 4). 



B.2 Fast CRCs Generated by Trimomials 

Let C be the CRC generated by the trinomia lTh{X) = X^ + X + l. The periods t of the trinomials are given 
in Fig. 25 for h>2>. Note that the periods t for the important cases h = 8, 16, 32, 64, 128, are unusually 
small. In fact. Fig. 25 shows that the period is t = h'^ — 1 when h is a. power of 2. Because Th{X) is a 
codeword of weight 3, the minimum distance d of this CRC must satisfy d < 3. From Theorem 2, we then 
have d = 3 ii Ub < t, and d = 2 ii Ub > t. This CRC can be implemented via Fig. 3 (for s < h) or Fig. 4 (for 
s > h). Applying the new technique (15) to M{X) = Th{X), the term B{X) in these figures is given by 

^^^>-[RNix)[A{X)X^-HX + l)] iis>h ^^^^ 



where N{X) = (X'^ + X + 1)X'' ^ . Remark 1 implies that it is simpler to compute the B{X) in (58) than 
the B{X) in (16). Thus, the CRC generated by the trinomial Th{X) = X^ +X + lis faster than the fast 
CRC generated by Fh{X) = X^ + X'^ + X + 1. However, the former has minimum distance only d = 3, 
whereas the latter has minimum distance d = 4. Further, for the important cases of /i = 8, 16, 32, 64, 128, 
the maximum length of the faster CRC generated by the trinomial Th{X) is much shorter than that of the 
fast CRC generated by Fh{X). For example, the faster 16-bit CRC generated by Tiq{X) has d = 3 and 
the maximum length of only 255 bits (see Fig. 25), whereas the fast CRC generated by Fiq{X) has d = 4 
and the maximum length of 2^^ — 1 = 32767 bits (see Section 3.2). Thus, these 2 types of CRCs illustrate 
tradeoffs between code capability and complexity. 
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Fig. 25 The period of trinomial Th{X) = X^ + X + 1. 



B.3 Fast and Optimal Error- Detection Codes 

In the following, we construct codes that are not only fast but also have optimal error-detecting capability. 
The h-hit CRC in Section B.2, which is denoted by C and has minimum distance d = 3, can be extended 

to yield a code that has d = 4 by adding an overall parity bit to the /i-bit CRC. Note that this extended 
code, denoted by C*, has h* = h+1 check bits and is not a CRC. The h-hii CRC has burst-error-detecting 
capability b = h. The following theorem shows that the extended code C* has burst-error-detecting capability 
b = h* ^h+l. 

Theorem 4. Let C be an /i-bit CRC generated by a polynomial M{X) of degree h. Assume that M{X) is 
not a multiple of X, i.e., gcd{X,M{X)) = 1, and that M{X) has odd weight, i.e., it has an odd number of 
terms. Let C* be the non-CRC code that is obtained by adding an overall parity check bit to C, i.e., C* has 
h* = h + 1 check bits. Then C* detects all error bursts of length h + 1 or less, i.e., its burst-error-detecting 
capability is b = h+1. 
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Proof. Let V*{X) be a codeword of C* . By definition of C*, we have V*{X) = V{X)X + parity(F(X)), 
where V{X) is a codeword of C. Because V{X) is a codeword of the CRC generated by M{X), we have 
V{X) = K{X)M{X) for some polynomial K{X) (see the proof of Theorem 2). We then have 

V*{X) = K{X)M{X)X + Y>-'^iiiy{K{X)M{X)) 

Let E*{X) be an error burst of length /i + 1 or less, which has the form 

E*{X)=X\E{X) + 1) 

where z > 0, and E{X) is a polynomial such that E{X) ^ 1 and degree(S(X)) < h. Using proof by 
contradiction, wc now show that E*{X) cannot be a codeword of C* . Thus, assume that E*{X) is a nonzero 
codeword of C*, i.e., 

X\E{X) + 1) = K{X)M{X)X + pBxitY{K{X)M{X)) 

Wc consider 2 cases: i = and i > Q. 
Case 1: i = Q. We then have 

E{X) + 1 = K{X)M{X)X + ^ax\tj{K{X)M{X)) 

This implies that ^ax\iy{K{X)M{X)) = 1. Thus, K{X) ^ 0, which implies that degrcc(A'(X)M(X)X) > 
h. But we also have = K{X)M{X)X, which implies that degvee{K{X)M{X)X) < h, which is a 

contradiction to the previous statement. 

Case 2: i> 0. Wc thou have parity(/s:(X)M(X)) = 0. Thus, XHE(X) + 1) = K{X)M{X)X. Because 
gcd_(X, M{X)) = 1, we must have X' = K{X)X. Thus, K{X) = X*-\ We then have piiiity{K{X)M{X)) = 
parity(M(X)) = 1, which is a contradiction to the previous statement that parity(-ft'(X)M(X)) =0. □ 

Let t be the period of the polynomial M{X) in Theorem 4. The extended code C* in Theorem 4 then 
has h* = h + 1 check bits, the burst-error-detecting capability b = h* = h + 1, the minimum distance 
d = 4, and the maximum length oi t + 1 bits. In the following, we show that C* becomes fast by choosing 
M{X) = Th{X) =X'^ + X + 1, i.e., M{X) is a trinomial. 

Thus, let M{X) = X'^ + X + 1, and Pckc{X) be the check /i-tuple for the CRC generated by this 
particular M{X). Suppose that s = h + 1. Because s > h, the check tuple Pgrc{X) can be computed by 
Algorithm 4 (see Fig. 4), in which the term B{X) is computed by (58), i.e., 

B{X) = R^ix) [A{X)X{X + 1)] 
= RNix) [A{X){X^+X)] 

where N{X) {X'' + X + l)X, and degrec(A(X)) <s = h+l. 

Recall from Theorem 4 that the non-CRC code C* is obtained by adding an overall parity check bit to 
the above CRC. The overall parity bit of C* is computed as follows. First, we define 

n-l 

W{X) = J2 Q^{X) + PcKc{X)X 

where Qo{X), . . . ,Qn--i{X) are the input s-tuples. The overall parity bit of C* is also the parity bit of 
W{X). The check polynomial of C* is then 

P{X) = PcRc{X)X + parity(W^(X)) 

which is a polynomial of degree < h + 1. 

Fig. 26 shows an implementation of C* , which is based on Fig. 4 (with s = h* = h + 1 and M{X) = 
X''- + X + 1) and includes the calculation of the overall parity bit of C* . Let e* be the operation count 
per input byte required for computing the check tuple P{X) for the code C*. By ignoring the negligible 
complexity due to the terms outside the loop indexed by i in Fig. 26, it can be shown that e* = 96/h*. It is 
shown in (39) of Appendix A that the complexity for the fast h*-hit CRC is also given by e/ = 96/h* (when 
s = h*). Thus, e* = e/, i.e., the (non-CRC) /i*-bit code C* is as fast as the fast h*-h\t CRC. 
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Let M{X) be a primitive polynomial of degree h, i.e., the period of M{X) is t = 2'' — 1. Let us now 
compare the capability and complexity for the following 2 codes, each of which has h* = h + 1 check 
bits. The first code is the familiar basic CRC generated by {X + 1)M{X), which has d = 4, 6 = /i + 1, 
and the maximum length of 2^^ — 1 bits. An example is the well-known CRC-16, which is generated by 
M{X) ^ {X + + X + 1) = X^^ + X^''' + X'^ + 1. Under bitwise implementation, this basic CRC 

requires Cf, = 45.5 operations per input byte for computing its check tuple, provided that the input message 
is composed of 16-tuples, i.e., s = h = 16 (see Fig. 9). 

The second code is the non-CRC code C* as described in Theorem 4, which has d = 4, b = h + 1, and 
the maximum length oi t + 1 = 2^ bits (which is 1 bit longer than the basic (h + l)-bit CRC above). It 
is well-known that any code that has h + 1 check bits and the minimum distance d — A must satisfy the 
following 2 constraints: (1) the burst-error detecting capability b < h + 1 and (2) the maximum length 
< 2^. Thus, the non-CRC code C* is optimal for error detection in the sense that, with h* = h-\- 1 check 
bits and = 4, it has the optimal b = h* and the optimal maximum length 2^. In fact, at the maximum 
length of 2^ bits, the code C* is a [2^,2^ — ft, — 1,4) extended Hamming perfect code with the optimal 
burst-error-detccting capability b = h + 1. Also, it is well-known that the undetected error probability of 
this perfect code is bounded above by 2^^''+^'. 

As shown in Fig. 26, the code C* is fast when M{X) = Th{X) ^ X'' + X + 1, i.e., M{X) is a trinomial. 
It is known that Th{X) is primitive for some values of h [22], including h ~ 7, 15, 63, and 127 (i.e., 
/i + 1 = 4, 8, 16, 64, and 128). For example, let /i 15 and s = ft* = ft + 1 = 16. Using Fig. 26, it can 
be shown that the operation count per input bye required for computing the check tuple for the (non-CRC) 
16-bit code C* is e* = 96/16 = 6 (which is much smaller than e(, = 45.5 of the basic CRC-16 above). To 
summarize, the (non-CRC) h*-h\t code C* (e.g., with h* = 4, 8, 16, 64, or 128 check bits) constructed from 
a primitive trinomial and an overall parity check bit has (a) the optimal error-detection capability and (b) a 
fast bitwise implementation. Note that, as discussed later in Section C.2, other fast and optimal codes can 
also be constructed from polynomials different from trinomials. 
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B = Riv [A(X2 + X)] ■ 
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W = W + Qi; 
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W = W + B; 
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P = B + pa.rity{Wy, 


11 


return P; 



Fig. 26 Algorithm for computing the fast (ft -|- l)-bit non-CRC code from 
the ft-bit CRC (generated by X'^ -|- X -|- 1) and an overall parity bit. 



APPENDIX C APPLICATION OF THE NEW TECHNIQUE TO GENERAL CRC GENERATOR 
POLYNOMIALS 

So far, we apply the new technique (15) to the polynomials X'^ + X"^ + X + 1, X^ + X + 1, and X'^ + 1 to yield 
fast CRCs. In this appendix, we apply the same technique to more general generator polynomials, and then 
determine the conditions under which the new technique is faster than the basic technique. In particular, we 
show later in Section C.1.2.1 that, when applied to the CRC-64-ISO generated by X^'^ + X'^ + X^ + X+l, the 
new technique is 15 times faster than the basic technique. This appendix presents only bitwise algorithms. 
Consider an ft-bit CRC that is generated by a general polynomial 

F(X) = X'' + X'" + X^"-' +--- + X^' +1 

(59) 

= X^ + H{X) 
where k > 0, h > ik > ik-i > ■ ■ • > ii > 0, and 

H{X) = X'" + X'"-^ +--- + X'' +1 (60) 
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Note that ik > k > 0, ik = degree(i?(X)), and k = weight of {H{X) + 1). Here, we have F{X) ^ + 1 
because ii > 0, i.e., H{X) ^ 1. The case F{X) = X^ + 1 is aheady discussed in Section B.l, where it is 
shown that the CRC reduces to the block-parity checksum. 

For example, let F{X) = X^"^ + + + + 1. We then have ft. = 32, fc = 3, is = 7, ^2 = 6, ii = 2, 
H{X) = + X6 + ^2 + 1, and degree(iJ(X)) = 7. 

The ft-bit CRC generated by (59) can be computed either by the basic technique (see Definition 1) or by 
the new technique (15). Recall that CRC complexity refers to the operation count per input byte (denoted 
by Ch and e/ for the basic and the fast CRCs, respectively) required for computing the CRC check tuple. 
Again, we assume that the CRCs are implemented in C, and the operations are counted according to rules 
(Rl) and (R2) stated in Appendix A. 

C.l General CRC Generator Polynomials 

First, suppose that the basic technique is used to compute the check tuple of the CRC generated by (59), 
i.e., B{X) is computed as in Definition 1 with M{X) = F{X) in (11). From (37), we have 

_ r8(4 + 5.5s)/s if s</i 

" \8(3 + 5.5s)/s ifs>/i > 

Next, suppose that the new technique is used to compute the check tuple of the CRC generated by (59). 
By letting M{X) ^ F{X) in (15), we have 

(Rf^x)[A{X)(X'^ + F{X))] iis<h 
^ ' \'RNix)[A{X)(X^ + N{X))] if,s>ft ^^^^ 

where N{X) = F{X)X''-^ and degree(A(X)) < s. Substituting (59) into (62), we have 

^^^>~\R^(x)[A{X)H{X)X^~'^] \is>h ^^'^> 

To briefly illustrate the main idea, consider the special case s — h. Then B{X) = Kp(^x) [A{X)X^'^ under 

the basic technique, and B{X) = Rpix) [A{X){X"' + X""^ H h + 1)] under the new technique. 

Intuition suggests that computing B{X) via the new technique is faster than the basic technique if ik is 
sufficiently small. More precise conditions on ik are given in the following. 

Let e be the operation count per input byte required for computing the CRC check tuple under the new 
technique (63). We have e = e/ for the special case F{X) = Fh{X) = X'' + X^ + X + 1. Ahhough Fig. 9 
shows that e/ < eb, it may not be the case that e < et, for the more general polynomial F{X). Thus, in 
the following, we determine the conditions on F(X) so that e < or et/e > 1 (i.e., the conditions under 
which the new technique is faster than the basic technique). Thus, the new technique serves as a faster 
alternative to the basic technique when Cb/e > 1. The computation of Cb/e for many CRCs are given later 
in Sections C.2-C.4. Before continuing, we present the following remarks, which contain some results that 
will be used later to determine the operation count required for computing B{X). 

Remark 12. Let r' be the number of operations required for computing 

B'{X) = A{X){X^'^ + ■ ■ ■ + X^' + 1) 

= A{X)X^" +■■■ + A{X)X^' + A{X) 

where n > 1, and we assume that the tuple A{X)X^^ can be stored in a single computer word. Computing 
A{X)X^' is then equivalent to shifting A{X) to the left by ji bits, which can be done by a single operation 
on most computers. Thus, for a given A{X), we can compute B'{X) by using n left-shift operations and n 
addition operations. We then have r' — 2n. □ 

Remark 13. Let M*{X) and A*{X) be 2 polynomials. Let n and q be such that n > q. Let r* be the 
number of operations required for computing 

B*{X) = Rm.(x) [A*iX){X^- + ... + x^- + l)] 

= Rmux) [A*{X)X^-] +■■■ + Rm.(x) [A*{X)X^-] + Rmux) [A*{X)] 
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We assume in the following that Rm*(x) [v4*(X)X-''] ^ v4*(X)X-'', i — n,...,q, i.e., the polynomial 
division is needed. Define 

= Rm.(x) [A*{X)] 
C,+,{X) = Rm-(x) [C,{X)X^^^^-^^] 



Jm + g-Jm + g-l 



Cm+q{X) — Rm'(X) [Cm+q-l{X)X- 

C„(X) = Rm-(x) [C„-i(X)X^"-^"-] 

Let ro be the operation count required for computing Cq-i{X). Given Cq^i{X), the term Cq{X) = 
Ra/*(x) can be computed with 5.5jq operations, and so on. Given C„_i(X), the final term 

Cn{X) — Km'(x) [Cn-iiX)X^"~^"-^'\ can be computed in 5.5(j„ — jn-i) operations. Thus, computing 
Cq-i{X), Cq{X), . . . , Cn{X) altogether requires 

To + 5.5jq + 5.5{jq+l ~jq)^ h 5.5(j„ - jn-l) = Tq + 5.5j„ 

operations. Because B*{X) — Cq-i{X) + Cq{X) + • • • + C„(X), the tuple B*{X) is computed by using 
[n — q + 1) addition operations. Overall, B*{X) can be computed with 

r* = 5.5j„ + n- q+ l + ro 

operations, where vq is the operation count required for computing Km'{x) [A*{X)]. □ 

Recall that, given the CRC generated by (59), our goal here is to determine the complexity for computing 
the check /i-tuple P{X) for an input message that consists of n tuples Qq{X), Qi{X), . . . , Qn-i{X). Each 
tuple Qi{X) has s bits. As shown in Figs. 1-4, the check tuple P{X) is computed by using a loop that 
computes B{X) for n times, where B{X) can be computed by the basic technique (see Definition 1) or 
by the new technique (15). In the following, we compare the complexity between these 2 techniques. We 
consider 2 cases: s > h and s < h. 

C.1.1 Case: s > h 

In this case, according to Fig. 10, the new technique uses Algorithm 4 (shown in Fig. 4), which contains the 
computation of B{X). From (60) and (63), we have 

B{X) = R^(x) [A{X)H{X)X'-^] 

= Rn(x) [A*{X)X'^] +Rjv(x) [A*(X)X^'=-i] + ... + Rjv(x) [A*{X)X''] +Rn(x) [A*{X)] 

where A*(X) = A{X)X^-'^. Using Remark 3, it can be shown that Ratjx) [A*{X)] ^Rn(x) [A(X)X^-''] can 
be computed with ro = 5.5(s — h) operations (see Appendix C). Applying Remark 13 with Ad*{X) = N{X), 
n = k, jn = ik, q = 1, and rp ~ 5.5(s — h), the tuple B{X) can be computed with r = 5.5(s — h + i^) + k 
operations. 

Fig. 13 shows that a; = 3 for s > h under Algorithm 4. By substituting the values of x and r into (33), 
the operation count per input byte required for computing the check tuple under the new technique is 

8[3 + 5.5{s-h + tk) + k] 

e = (64) 

s 

Using (61) and (64), we have 

eb 3 + 5.5s ^g^^ 

e 3 + 5.5{s — h + ik) + k 

Thus, the new technique is faster than the basic technique when et/e > 1, i.e., 3 + 5. 5s > 3 + 5.5(s — h + ik) + k, 
which is equivalent to 
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where ik = degree(i?(X)) and k = weight of {H{X) + 1). 

Remark 15. Suppose that F{X) is either X'^ + + X + 1 or X'^ + X + 1. Then ik < 2, i.e., ik is a 
very small value. Thus, it is appropriate to use loop unrolling in the calculation of Cm{X) above. Then (64) 
reduces to e = 8[3 + 5.5(s — h) + 3.5ik + k]/s, and then 

£6 ^ 3 + 5.5s 

e 3 + 5.5{s-h)+3.5ik + k ^ ' 

Note that it is common to choose s, /i S {8, 16, 32, 64}, i.e., the typical values of s and h are not very small, 
even when ik is very small. 

For example, suppose now that s = h and F(X) = Fh{X) = X^ + X"^ + X + 1, i.e., k = ik = 2 and 
e = Cf. Substituting s = h and k = ik = "2 into (66), we have 

e^^e,^ 3 + 5.5/. = 0.25 + 0.458fe 
e e/ 3 + 3.5x2 + 2 

as previously shown in (29). □ 
C.1.2 Case: s < h 

In this case, according to Fig. 10, the new technique uses Algorithm 3 (shown in Fig. 3), which contains the 
computation of B{X). From (63), we have B{X) = Rf(x) [A{X)H{X)]. From Fig. 13, we have 

6 if /i = 8, 16, 32, 64 

7 if /i ^ 8, 16, 32, 64 

By substituting the values of x into (33), the operation count per input byte required for computing the 
CRC check tuple under the new technique is 

8(6 + r)/s if = 8, 16, 32, 64 , . 

^ \ 8(7 + r)/s if /i 7^ 8, 16, 32, 64 ^ ' 

where r is the number of operations required for computing B(X) = Rir(x) From (61), we 

have Cb = 8(4 + 5.5s)/s for s < h, which is used with (67) to yield 

£6 ^ f (4 + 5.5s)/(6 + r) ii h = 8, 16, 32, 64 

e \ (4 + 5.5s)/(7 + r) if 7^ 8, 16, 32, 64 ^ ' 

where r, which depends on whether ik < h — s ov ik < h — s, is computed in the following subsections. 
As seen below, the condition ik < h — s implies that B{X) = 'Rf{x) = -^{X)H{X), i.e., the 

polynomial division is eliminated. 

C. 1.2.1 Case: s < h and ik <h — s 

Because degree(A(X)) < s and degree(iJ(X)) = ik, we have (legTee{A{X)H{X)) < s + ik- The assumption 
ik < h- s then implies that degiee{A{X)H{X)) < h. Thus, B{X) = Rp^x) [A{X)H{X)] = A{X)H{X), 
i.e., the polynomial division is eliminated. Let r be the number of operations required for computing 
B{X) = A{X)H{X). Using (60), we have 

B{X) = A{X)X''' + A{X)X^''-' +■■■ + A{X)X^' + A{X) 

Applying Remark 12, we then have r = 2k, which is substituted into (67) and (68) to yield 

_r8(6 + 2fc)/s if /i = 8, 16,32,64 , . 

^ ~ \ 8(7 + 2fc)/s if /i 7^ 8, 16, 32, 64 ^ ' 

^ = / + 5-5s)/(6 + 2k) if /i = 8, 16, 32, 64 

e \ (4 + 5.5s)/(7 + 2fc) if /i 7^ 8, 16, 32, 64 ^ ' 
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Thus, the new technique is faster than the basic technique if ef,/e > 1, i.e., 

4 + 5.5s > 

which is equivalent to 



6 + 2fc if /i = 8, 16, 32, 64 

7 + 2k if ^ 8, 16, 32, 64 



J 2.75s -1 if /i = 8,16,32,64 , , 

\ 2.75s -1.5 if / 8,16,32,64 ^ ^ 

where ik = degiee{H{X)) and k = weight of {H{X) + 1). 

For example, consider the CRC-64-ISO generated by the primitive polynomial 

F{X) = X^'^ + X'^ + + X + 1 

Here, we have h = 64, fc = 3, and ik = 4. Assume that s < h — ik = 60. Under the new technique, we then 
have 

B{X) = Rp^x) [A{X){X^ + X^ + X + 1)] 

= A{X)X'^ + A{X)X^ + A{X)X + A{X) 
i.e., the polynomial division is eliminated. Substituting fc = 3 into (70), we have 

Cb 4 + 5.5s 
7 ~ 12 

For the special case s = 32, we have ef,/e = 15, i.e., the new technique is 15 times faster than the basic 
technique for the CRC-64-ISO. We also have Cb/e = 15 when s = 32 for a 64-bit CRC generated by a 
polynomial that has the following more general form 

F{X) = X^'^ + X'^ + X'^ +X'^ +1 

where 32 > i^ > i2 > ii > 0. 

Remark 14. For the CRC-64-ISO, the value eb/e shown above is based on the implementation for which 
B{X) is computed directly by the single statement B{X) = A{X)X'^ + A{X)X^ + A{X)X + A{X). An 
alternative implementation, suggested by Y. Sawada, is to compute B{X) via 2 statements: first, compute 
B'{X) = A{X)X + A{X), and then compute B{X) = B'{X)X^ + B'(X). □ 

C.1.2.2 Case: s < h and ik > h — s 

As seen below, the computation of B{X) requires the polynomial division in this case. The assumption 
ik > h — s implies that there exists m* such that 1 < m* < k , i„i. > /i — s, and ij < h -- s for all j < m*. 
There are 3 subcases to consider. 

Case 1: 1 < m* < k. By letting m = m* — 1, we have 

F{X) = X'' + X"' +■■■ + X'-'+i + X'"^ + ■ • ■ + + 1 

where h > ik > im+i > > *i > 0, im+i > h — s, and im < h — s. 

Because i,„ < h ~ s, we have degree(A(X)X*") < h, ior 1 < n < m. Thus, 

Rf^x) [AiX)X'-] = AiX)X'- 

for 1 < n < m. From (63), we then have 

B{X) = Rpi^x) [A{X)X"'] +■■■ + Rpi^x) [A{X)X'"^+'] + A{X)X'"^ +■■■ + A{X)X'' + A{X) 

By letting A*{X) = A{X)X'"'+^, we can write 

B{X) = B^iX) + B^iX) 
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where 
and 

B2{X) = A{X)X^"' + ■■■ + A{X)X^^ + A{X) 

Because Rf(^x)[A*{X)] = Rf(^x) [A{X)X''-+^] = Rp.(x) , the term 

Ri?(x) [^*(^)] can be computed with rg = 1 + 5.5[im+i — {h — s)] operations for a given A{X). Using 
Remark 13, Bi{X) can be computed with 

n = 5.5(ife - ira+i) + fc - (to + 2) + 1 + ro 
= 5.5[ifc — (h~ s)]+k — m 

operations. Using Remark 12, B2{X) can be computed with r2 = 2to operations. Overall, the number of 
operations required for computing B(X) is 



r = n + r2 + 1 
= 5.5[ife -{h-s)] + k + m + l 



(72) 



which is substituted into (68) to yield 



£6 _ r (4 + 5.5s)/(6 + 5.5[zfc - {h - s)] + k + m +1) if h = 8, 16, 32, 64 
e ~ \ (4 + 5.5s)/(7 + 5.5[ife - {h - s)] + k + m + 1) if /i ^ 8, 16, 32, 64 



(73) 



Thus, the new technique is faster than the basic technique when 

4 + 5.5s > 
which is equivalent to 



+ 5.5[ik - {h- s)] + k + m+l if /i = 8, 16, 32, 64 
+ 5.5[ifc -{h-s)]+k + m+l if /i ^ 8, 16, 32, 64 



/i- (3 + fc + TO)/5.5 if /i = 8, 16, 32, 64 
/i - (4 + fc + m)/5.5 if /i 7^ 8, 16, 32, 64 



where ik = degree(i?(X)) and k = weight of {H{X) + 1). Recall that we also assume that h > ik > im+i > 
im^ii > 0) im+i > h — s, and im ^ h — s. 

For example, consider the CRC-32-IEEE 802.3 generated by the following primitive polynomial: 

F{X) = + + + + X^^ + + X" + + X« + + + + + X + 1 (74) 

i.e., h = 32, k = 13, ik = 26. Assume that s = 16. We then have m = 10. Substituting these values into (73) 
yields e^/e = 92/85, i.e., the new technique is slightly faster than the basic technique. 
Case 2: m* = 1. We then have 

F{X) = X'' + X^*" + • ■ • + X" + 1 

where ii > h — s. We have 

B{X) = Rf^x) [A{X)X''] +■■■ + Rf(x) [A{X)X'-] + Rp^x) [A{X)X''] + A{X) 

= RFix) [A*{X)X^--'^] +■■■ + Rf^x) [A*{X)X'--'^] + Rf^x) [A*{X)] + A{X) 
= BiiX)+A{X) 

where A*{X) = ^(X)X^i and 

Si(X) = R^(;,) [A*(X)X^'=-'i] + • ■ • + R^(;,) [A*{X)X'--'^] + Rf^x) [A*{X)] 

38 



Because Rp(x) [A*{X)] = Rpix) [A{X)X'^] = Rpi^x) [{A{X)X''--')X'^-^''-''^ , the term R;.(^) [A*{X)] can 
be computed with ro = 1 + 5.5[h — {h — s)\ operations for a given A{X). Using Remark 13, Bi{X) can be 
computed with 

n = 5.5(?:fe - ii) + fc - 2 + 1 + ro 
= 5.5[ife ~{h- s)\+k 

operations. Thus, the number of operations required for computing B{X) is 

r = ri + 1 
= 5.5[ifc -{h-s)] + k + l 

which is substituted into (68) to yield 

eb _ /(4 + 5.5s)/(6 + 5.5[ift-(ft,-s)]+A: + l) if /i = 8, 16, 32, 64 



e \(4 + 5.5s)/(7 + 5.5[ifc-(/i-s)]+fc + l) if /i ^ 8, 16, 32, 64 ^''^^ 
Case 3: m* = k. We then have 

F{X) ^X'' +X''' + --- + X'' +1 

where ik > h — s, and i„ < /i — s for all n < k. We have 

B{X) = Rir(x) [A{X)X'''] + A{X)X'''-' +■■■ + A{X)X'' + A{X) 
= Rf^x) [A{X)X'''] +B2{X) 

where 

B2{X) = A{X)X'''-^ + ■■■ + A{X)X'' + A{X) 

Because Rp^x) [A{X)X^>'] = Rp^x) [(v4(X)X^-^)X*'=-(''-^)] , the term Rp^x) [A{X)X'>'] can be com- 
puted with ro = l + 5.5[ik — (h — s)] operations for a given A{X). Using Remark 12, B2{X) can be computed 
with r2 = 2{k — 1) operations. Thus, the number of operations required for computing B{X) is 

r = ro + r2 + 1 
= 5.5[ife -{h- s)] + 2k 

which is substituted into (68) to yield 

eb _UA + 5.5.s)/(6 + 5.5[ifc - {h - s)] + 2k) iih = S, 16, 32, 64 



e \{A + b.bs)/{7 + b.b[ik- {h- s)] + 2k) if /i 7^ 8, 16, 32, 64 ^''^^ 

For example, consider the CRC-32-IEEE 802.3 generated by (74), i.e., h = Z2,k = 13, iu = 26. Assume 
that s = 8. Substituting these values into (76) yields eb/e = 48/43, i.e., the new technique is slightly faster 
than the basic technique. 



C.2 CRC Generator Polynomials of Weight 3 

We now consider the special case fc = 1, i.e., F{X) is a polynomial of weight 3: F{X) = X^ + X^^ + 1. By 
defining i = ii, we have 

F{X) =X^ + X^ + l (77) 

where h> i> Q. Note that F{X) = Th{X) = X^ -\- X + I {or the special case i = I. Fig. 27 lists some 
wcight-3 polynomials along with their periods, for h < 32. In Section B.2, the fast h*-hit perfect codes are 
constructed from the CRCs generated by Th{X) for h* = 4, 8, 16, 64, 128, where h* ^ h + 1. In the following, 
we show that other fast perfect codes can also be constructed from CRCs generated by weight-3 polynomials. 

Let C be the /i-bit CRC generated by F{X) in (77). Recall from Theorem 2 that the maximum length 
of C equals the period of F{X). Assume that s < h — i. Using the new technique, we have B{X) = 
Rf{x) [A{X){X'^ + 1)] = A{X){X^ + 1), i.e., the polynomial division is eliminated. Let e be the operation 
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count per input byte required for computing the check tuple P{X) of the h-hit CRC C. Then e is given 

by (69). 

Let C* be the non-CRC code that is constructed by adding an overall parity bit to the h-hit CRC C, 
and P*{X) be the check tuple of C* . Let e* be the operation count per input byte required for computing 
P*{X). Note that P*{X) has h + 1 bits, which can be computed by an algorithm that is similar to Fig. 26. 
Using (69) and Fig. 26, it can then be shown that 



,^(8i7 + 2k)/s if /i = 8, 16, 32, 64 
^ \8(8 + 2A:)/s if /i 7^ 8, 16, 32, 64 ^ ' 



Substituting k = 1 into (78), we have 



,^(72/s if /i = 8, 16, 32, 64 

\80/s if /i 7^ 8, 16, 32, 64 ^ ^ 



Let us now compare the speed of the {h+ l)-bit code C* with that of the fast {h+ l)-bit CRC generated 
by Fh+i{X) = X^+i +X^ + X + 1. Prom (39), we have 



_(80/s if /i + l = 8, 16,32,64 , , 

^^~{88/s if /i + 17^8, 16, 32, 64 ^ ' 



for s < h. By comparing (79) with (80), we have 

e* < ef (81) 



ioi s < h — i. Thus, (81) shows that the (/i + l)-bit non-CRC code C* is at least as fast as the fast {h+l)-hit 
CRC generated by Ffi+i{X), i.e., C* is also a fast code for s < h ~ i. Further, at its maximum length, the 
code C* is the (2'', 2'' - /i - 1, 4) extended Hamming perfect code, provided that F{X) = X'' + X' + 1 is a 
primitive polynomial (i.e., its period is 2'* — 1). 

For example, let F{X) = X^^ + X"^ + 1, i.e., h = 11 and i = 2. Fig. 27 shows that F{X) is primitive. Let 
C be the 11-bit CRC generated by F(X). The non-CRC code C* , which is constructed by adding an overall 
parity bit to C, is the (2048, 2036,4) extended Hamming perfect code. Note that both C and C* are fast if 
we choose s < h — i = 9. Suppose that we choose s = 8. From (79) and (80), we then have e* = 80/8 = 10 
and 6/ = 88/8 = 11, i.e., e* < e/. Thus, for s = 8, the non-CRC 12-bit code C* is faster than the fast 12-bit 
CRC generated by Fi2{X) = X^^ + X^ + X + 1. Further, the maximum length of the non-CRC code (which 
is 2,048 bits) is also much longer than that of the fast CRC generated by Fi2{X) (which is 595 bits), and 2 
bits longer than that of the CRC generated by F{X) = -|- X^ -|- X -|- 1 (which is 2,046 bits, as discussed 
later in Section C.3). 

Similarly, we can construct a 32-bit extended Hamming perfect code C* by adding an overall parity bit 
to the CRC C generated by F(X) = X^i X^ -f- 1 (see Fig. 27). We have ft = 31 and i = 3. Both C and 
C* are fast if we choose s < h — i = 28. 
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Fig. 27 The period of + X'' + 1. 
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C.3 CRC Generator Polynomials of Weight 4 

We now consider the special case k = 2, i.e., F{X) is a polynomial of weight 4: 

F{X) =X^ + X'^ + + 1 

where /i > «2 > ii > 0. In particular, F{X) = Fh{X) = X''' + X^ + X + 1 when ^2 = 2 and ii = 1. Fig. 28 
lists some wcight-4 polynomials F{X) = X^ + + + 1, which have periods that are greater than 
those of Fh{X), for h < 32. Recall from Theorem 2 that the maximum length of a CRC equals the period 
of its generator polynomial. In the following, we consider the application of the new technique to weight-4 
generator polynomials for CRCs such as CRC-16 and CRC-CCITT. For brevity, we only present the results 
for s < h (the case s > h can be handled similarly). There are 3 cases to consider. 

Case 1: 12 < h — s (i.e., s < h — 12). Using the new technique, we have B{X) = 
Rf{x) [A{X){X'"' + 1)] = A{X){X^^ + X'l + 1), i.e., the polynomial division is eliminated. Sub- 

stituting k — 2 into (69), we have 

_ J 80/,s if = 8, 16, 32, 64 , , 

^~\88/s if /i 7^ 8, 16, 32, 64 ^ > 

By comparing (82) with (39), we have 

e = e/ (83) 

iov s <h — i2- Using k = 2 and s < /i — ^2 in (71), it can be shown that the new technique is faster than the 
basic technique when 

2<s<h-i2 (83) 

For example, let F{X) = X^"^ + X'' + X + 1, i.e., /i = 32 and 12 = 4. It follows from (83) that the new 
technique is faster the basic technique when 2 < s < 28. Under this condition, we have 

B{X) = A{X){X^ + X + l) (85) 

i.e., the polynomial division is eliminated. Fig. 28 shows that the 32-bit CRC generated by F{X) has 
the maximum length of 2,147,483,647 = 2-'^ - 1 bits {tv 268,435,456 bytes). Recall from Fig. 5 that the 
original fast 32-bit CRC, generated by F32(X) = X^"^ + X'^ + X + 1, has the maximum length of 2,097,151 
« (2^1 - 1)/1024 bits (« 262,143 bytes). Thus, the maximum length of the CRC generated by F{X) is 
substantially larger than that of the fast CRC generated by F-i2{X). However, (83) shows that these 2 CRCs 
have identical complexity when s < 28. 

Consider the 12-bit CRC generated by F{X) = X^"^ + X^ + X + I. Fig. 28 shows that this CRC 
has the maximum length of 2,046 bits, which is much larger than that of the fast CRC generated by 
Fi2{X) = X^"^ + + X + 1, which has the maximum length of only 595 bits (see Fig. 5). However, (83) 
shows that these 2 CRCs have identical complexity when s < 9. 

Case 2: i2 > h — s and ii < h — s. Using the new technique, we have B{X) = 
Rf(x) [A{X){X'^ + -I- 1)] = Rf(^x) [A{X)X'^] + A{X)X'^ + A{X). Substituting k = 2 into (70) yields 

£6 _ r (4-|-5.5s)/(10 + 5.5[i2 - {h - s)]) if = 8, 16, 32, 64 
e ~ \(4-|-5.5s)/(ll + 5.5[i2- (/i -.s)]) if /i 7^ 8, 16, 32, 64 

For example, consider the CRC-CCITT generated by F{X) = X^'^+X^'^+X^ + 1, i.e., h = 16, 12 = 12, 
and ii = 5. Assume that s = 8. We then have eb/e ={4 + 5.5 x 8)/(10 + 5.5 x 4) = 48/32 = 1.5. Thus, for 
the 16-bit CRC-CCITT, the new technique is 50% faster than the basic technique. 

Next, consider the CRC-16 generated by F{X) = X^^ + X^^ + X'^ + I, i.e., h = 16, Z2 = 15, and ii = 2. 
Assume also that s = 8. We then have Cb/e = (4 + 5.5 x 8)/(10 + 5.5 x 7) = 48/48.5. Thus, for the CRC-16, 
the new technique is slightly slower than the basic technique. 

Case 3: h > h - s. Using the new technique, we have B{X) = Rf(x) [A{X){X'-^ + 1)] = 

Rf{x) [A{X)X'^] + Rf{x) [A{X)X*^] +A{X). Substituting k = 2 into (75) yields 

£6 _ r (4-^5.5s)/(9-F5.5[i2-(/i-s)]) if /i = 8, 16, 32, 64 
e ~ \ (4 -I- 5.5s)/(10 -I- 5.5[i2 - (/i - s)]) if /i 7^ 8, 16, 32, 64 
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Fig. 28 The period of X'' + X^^ + X'l + 1. 
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C.4 CRC Generator Polynomials of Weights Greater Than 4 



We now consider the case fc > 3, i.e., the CRC generator polynomial 



F{X) =X^ + X'" + X'"-^ + • • ■ + + 1 (86) 



has weight greater than 4, i.e., it contains more than 4 terms. Our goal here is to find generator polynomials 
for CRCs that (a) have minimum distance dmin > 4 and (b) can be efficiently implemented by the new 
technique (15), i.e., they have low complexity. Codes with minimum distance dmin detect all patterns of less 
than dmin errors. For example, the fast CRCs generated by Fh{X) = X^ + X'^ + X + 1, which have dmin = 4, 
detect all patterns of 1, 2, and 3 errors. 

An error pattern E{X) is detected by the CRC generated by F{X) if E{X) is not a multiple of F{X), 
i.e., Rf(x) [E{X)] ^ (see the proof of Theorem 2). This fact can be used to search for CRCs that can 
detect specified sets of error patterns. 

In the following, for a given m > 0, we search for /i-bit CRCs of length I that can detect all patterns of 
m errors: 



E{X) = X"— 1 + X"'"-^^ + • • • + + X"" 



where I > am-i > a-m-i > • • • > oi > ao > 0. There are (^) such m-crror patterns. Let Im be the maximum 
length of a CRC that can detect all patterns of m errors. A CRC with dmin = m + 1 detects all patterns of 
m errors or less, and fails to detect some patterns of m + 1 errors. Thus, a CRC will have dmin = to + 1 if its 
total code length = min{/i, I2, ■ ■ ■ , Im} bits, and dmin > to + 1 if its total code length < minjZi, I2, ■ ■ ■ , Im} 
bits. Note that I2 of a CRC is also its period. If F{X) in (86) has even weight, then all patterns of odd 
number of errors are detected, i.e., Im = 00 for odd to. Thus, for odd m, the CRC has dmin = to + 1 if its 
total code length = min{/2.'4- ■ • ■ - Im-i} bits. For example, suppose that k is even, i.e., F{X) in (86) has 
even weight. The CRC generated by F(X) then has dmin = 6 if its total code length — min{/2, h} bits. 
Further, this CRC has dmin = 8 if its total code length = min{Z2, U, le} bits. 

A straightforward technique to show that a CRC of length / will detect all m-error patterns is to verify 
that each of such (^'J TO,-crror patterns is not a multiple of the CRC generator polynomial. This brute-force 
technique has computational complexity 0(1"^) [10]. A faster technique, which has computational complexity 
0(^™'^), is presented in Remark 17. As an example. Fig. 29 shows some /i-bit CRC generator polynomials 
of weight 6, for h = 16,24,32, and 64. These CRCs and their I2 and U are found by computer search. For 
each value of h, these CRC generator polynomials are arranged according to increasing I4 (see Remark 18). 
As discussed above, these CRCs detect all odd numbers of errors because their generator polynomials have 
even weights. Further, these CRCs have dmin ~ 6 when their total code lengths / = min{/2, '4}- 

In the following examples, we discuss the performance and implementation of some CRCs that are 
generated by the polynomials shown in Fig. 29. Here, we have k = 4 and F{X) = AT'' + A*^ + A*'' + A*^ + 
A*i + 1. We assume that the input message is divided into n tuples Qo{X),Qi{X), . . . ,Qn-i{X). Each 
tuple Qi{X) has s bits (e.g., s = 8, 16, 32 and 64). Recall that the /i-bit CRC generated by (86) can be 
computed either by the basic technique (see Definition 1) or by the new technique (15). Let e and Cf, denote 
the operation count per input byte required for computing the CRC check tuple under the new technique 
and under the basic technique, respectively. Thus, the new technique is Cb/e times faster than the basic 
technique. The calculation of e, Cb, and eb/e for general CRC generator polynomials is given in Section C.l. 
As shown in the following examples, with proper choice of s for the CRCs, the new technique can be much 
faster than the basic technique. 
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CRC generator polynomial 



I'z = period 



1 



period 



Xl6, 



■ + X3 + X2 + X + 1 
-X5 + X3 + X2 + X + 1 

■ X6 + X5 + X2 + X + 1 

- X'^ + X6 + + X3 + 1 

- x« + x" + x-'' + X + 1 

-Xi" + X*^ + X"^ + X'-' + 1 
-X" + X11 +X-''' +X2 + 1 



17 
67 
74 
77 

115 
128 
130 



31620 

534 

12264 

28658 

28658 

254 

258 



1.03627 

61.3614 

2.6718 

1.14338 

1.14338 

129.004 

127.004 



X24, 

X24, 
X24. 
X24. 
X24. 
X24. 
X24 



- X4 + X-^ + 

■X5 +X3 + 
■X'^ +X6 + 
-X^ +X6 + 

-X8 +X5 + 

-X14 + X13 

-X16+X12 

-X"' +Xi5 
-Xi«+Xi3 



X2 + X + 1 

X2 + X + 1 

X* + X2 + l 

X" + X3 + 1 



X4+X2 

+ X11 + 
+ Xio + x 

+ X9+X8 



+ 1 

Xio + 1 

1 

- 1 



X 



6 



25 

461 

530 

561 

691 

1024 

1030 

2048 

2050 



1048572 

2446675 

344043 

2046 

8388607 

2046 

7161 

4094 

4098 



8.00003 

3.42857 

24.3824 

4100 

1 

4100 
1171.43 

2049 
2047 



x:y2 

X32. 

X32, 

X32, 

X32. 

X32 

X32 

X32 

X32. 

X32. 



-X* +X3 + 
■X5 +X3 + 

■ x^ + x-* + 

- X** + x" + 
■x'^ + x'^ + 

-Xii+Xio 

-x" + xi" 

- Xl2 + 8 ^ 

.X17 + X15 
■X18+X17 



X2 + X + 1 

X2 + X + 1 

X3 + X + 1 

X3 + X2 + 1 



XS +X2 



1 



+ X9 + X* + 1 

+ X^ + X'' + 1 

- x* + X-'- + 1 

+ X13 +X2 + 1 
+ Xi5 + 1 



33 

2948 
3258 
3501 
4145 
4198 
4480 
4856 
4989 
32770 



1610612724 

133693185 

805306362 

2139094785 

1761607470 

1408426068 

2013265905 

2147483647 

2147483647 

65538 



1.33333 

16.0628 

2.66667 

1.00392 

1.21905 

1.52474 

1.06667 

1 

1 

32767 



XS4 
X64. 



-X4. 
-X5. 



X3. 
X3 



■X2 + X + 1 

■X2 + X + 1 



65 

> lO'""' 



2.69 X 10^* 
3.46 X 10^* 



3.42857 
2.66797 



Fig. 29 CRC generator polynomials of weight 6 
{dmm > 4 if total lengths < I2, and dmin = 6 if total lengths < min{Z2,^4}). 



Example 1. Consider the 16-bit CRC generated by X'^^ + + X'^ + X'^ + X + I. From Fig. 29, we have 
li = 115 and I2 = 28658. Thus, this CRC detects (a) up to 5 errors if its total length < 115 bits, and (b) 
up to 3 errors if its total length < 28658 bits. In other words, this CRC has dmin = 6 if its total length 
< 115 bits, and di„i„ = 4 if its total length < 28658 bits. Here, we have h = 16, k = 4, ih = 14 = 8. For 
implementation, we consider 2 cases: s = 8 and 16. 



Case 1: s = 8 < h. Then 8 
this case. Prom (70), we have 



«4 



ik < h — s = 8. Thus, the results of Section C. 1.2.1 can be used in 



66 _ 4 + 5.5s _ 4 + 5.5 X 8 _ 48 _ ^ 
~e ~ 6 + 2k ~ 6 + 2x4 ~ U~ 



i.e., the new technique is 3.43 times faster than the basic technique when s = 8. 

Case 2: s = 16. Because s > h, the results of Section C.1.1 can be used in this case. From (65), we have 



66 



3 + 5.5s 



3 + 5.5 X 16 



e 3 + 5.5{s-h + ik) + k 3 + 5.5x8 + 4 



91 

51 = '™ 



i.e., the new technique is only 1.78 times faster than the basic technique when s — 16. 

Thus, under the new technique, using s = 8 is much faster than using s = 16 for this CRC. The C 
program for computing the CRC check bits using s = 8 is shown in Fig. 30. 
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unsigned short CRC16 Cint n, unsigned char *Q) 

{ 

int i, hs, s; 

unsigned short A, B, P; 
s = 8; 

hs = 8; /* hs = h-s, h = 16 */ 



P = 0; 

for (i=0; i<n; i=i+l) /* 2 */ 

{ 

A = CP»hs) A Q[i]; /* 2 */ 

B = (A«8) A (A«4) A 

CA«3) A CA«1) A A; /* 8 */ 
P = B A (P«s); /* 2 V 

} 



return P; 
} 



Fig. 30 C program with s = 8 for 16-bit CRC generated by X^^ + + + + X + I. 

□ 

Example 2. Consider the 24-bit CRC generated by + X^ + X^ + X'^ + X"^ + 1. From Fig. 29, we have 
Ia = 691 and I2 = 8388607. Thus, this CRC detects (a) up to 5 errors if its total length < 691 bits, and (b) 
up to 3 errors if its total length < 8388607 bits. Here, we have fc = 4, = 14 = 8, and /i = 24 ^ 8, 16, 32, 64. 
For implementation, we consider 2 cases: s = 8 and 16. 

Case 1: s ~ 8 < h. We have 8 = i4 = ik < h — s = 16. Thus, the results of Section C. 1.2.1 can be used 
in this case. From (70), we have 

66 _ 4 + 5.5.S _ 4 + 5.5 X 8 _ 48 _ ^ 20 
7~ 7 + 2k ^ 7 + 2x4 ^15~ 

Case 2: s = 16 < h. We have 8 = 14 = ik < h — s = 8. Thus, the results of Section C. 1.2.1 can be used 
in this case. Prom (70), we have 

66 _ 4 + 5.5s _ 4 + 5.5 X 16 _ 92 _ 
7~7 + 2fc~ 7+2x4 ~15~ 

Thus, under the new technique, using s = 16 is much faster than using s = 8 for this CRC. □ 

Example 3. Consider the 24-bit CRC generated hy X^^ + X^'^ + X^^ + X'^ + X^ + 1. From Fig. 29, we have 
I4 = 2048 and h = 4094. Thus, this CRC detects (a) up to 5 errors if its total length < 2048 bits, and (b) 
up to 3 errors if its total length < 4094 bits. Here, we have k = 4, ik = 14 = 16, and /i = 24 ^ 8, 16, 32, 64. 
For implementation, we consider 2 cases: s = 8 and 16. 

Case 1: s = 8 < h. We have 16 = 14 = ife < /i — s = 16. Thus, the results of Section C. 1.2.1 can be used 
in this case. Prom (70), we have 

66 _ 4 + 5.5s _ 4 + 5.5 X 8 _ 48 _ 
~e ~ 7+ 2k ~ 7+2x4 ~15~ 

Case 2: s = 16 < h. Wc have 16 = 24 = ik > h — s = 8. Thus, the results of Section C.1.2.2 can be 
used in this case. Further, consider Case 1 of Section C.1.2.2, which requires the existence of an m such that 
im+i > h — s > im- We have 9 = i2>/i — s = 8>ii=8. Thus, m = 1. Using (73), we have 

66 _ 4 + 5.5s _ 4 + 5.5x16 _ ^2 _ 

7 ~ 7 + 5.5[ife - (/i - s)] + fc + ro + 1 ~ 7 + 5.5 x (16 - 8) + 4 + 1 + 1 ~ 57 ~ ' 

Thus, under the new technique, using s = 8 is much faster than using s = 16 for this CRC. □ 

Example 4. Consider the 32-bit CRC generated by + + x^ + X* + X^ + l. From Fig. 29, we have 
/4 = 4856 and h = 2147483647. Thus, this CRC detects (a) up to 5 errors if its total length < 4856 bits. 
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and (b) \ip to 3 errors if its total length < 2147483647 bits. Here, we have h = 32 = degree(i^(X)), = 4, 
ik = H = 12, ii — S, i2 = 4, and ii = 3. For implementation, we consider 3 cases: s = 8, 16, and 32. 

Case 1: s = 8 < h. Then 12 = 14 = ik < h — s = 24:. Thus, the results of Section C. 1.2.1 can be used in 
this case. Prom (70), we have 

66 _ 4 + 5.5s _ 4 + 5.5 X 8 _ 48 _ g 
~e ~ 6 + 2k ~ 6 + 2x4 ~ U~ 

Case 2: s ~ 16 < h. Then 12 = 14 = i/. < h — s = 16. Thus, the results of Section C. 1.2.1 can be used in 
this case. From (70), we have 

Cb 4 + 5.5s 4 + 5.5 x 16 92 ^ ^„ 
— = = = — = 6.57 

e 6 + 2fc 6 + 2x4 14 

Case 3: s = 32 > h. Then 12 = i4 = ik > h — s = 0. Thus, the results of Subsection C.1.2.2 can be 
used in this case. Further, we have 3 = ii > h — s = 0. Thus, Case 2 of Section C.1.2.2 is applicable here. 
Using (75), we have 

66 _ 4 + 5.5s _ 4 + 5.5x32 _ 180 _ ^ 

~e~ 6 + 5.5[ik - (/i - s)] + fc + 1 ~ 6 + 5.5 X 12 + 4 + 1 ~ 77 ~ 

Thus, under the new technique, using s = 16 is much faster than using s = 8 and 32 for this CRC. The 
C program for computing the CRC check bits using s = 16 is shown in Fig. 31. 



unsigned int CRC32 (int n 


unsigned 


short *Q) 


{ 






int i, hs, s 






unsigned int A, B, P; 






s = 16; 






hs = 16; /* hs = h 


-s, h = 32 


*/ 


P = 0; 






for Ci=0; i<n; i=i+l) 


/* 2 


*/ 


{ 






A = CP»hs) A Q[i] ; 


/* 2 


*/ 


B = CA«12) A (;a«8) ' 






CA«4) A CA«3) A 


A; /* 8 


*/ 


P = B A CP«s); 


/* 2 


*/ 


} 






return P; 






} 







Fig. 31 C program with s = 16 for the 32-bit CRC generated by X^^ + X^'^ + X^ + X'^ + X^ + 1. 

□ 

Example 5. Consider the 32-bit CRC generated by F{X) = X^s + X^^ + Xi^ + Xi^+X^^ + l (= 10006c001 

in hexadecimal notation). From Fig. 29, we have Z4 = 32770 and I2 = 65538. Thus, this CRC detects (a) up 
to 5 errors if its total length < 32770 bits, and (b) up to 3 errors if its total length < 65538 bits. Here, we 
have h = 32, k = i, ik = 14, = 18, i^ = 17, 12 = 15, and ii = 14. For implementation, we consider 3 cases: 
s = 8, 16, and 32. 

Case 1: s = 8 < h. Then 18 = 14 = ik < h — s = 24. Thus, the results of Section C. 1.2.1 can be used in 
this case. Prom (70), we have 

6b _ 4 + 5.5s _ 4 + 5.5 X 8 _ 48 _ ^ 
~e ~ 6 + 2k ~ 6 + 2x4 ~ 14~ 

i.e., for bitwise implementation, the 32-bit CRC generated by F{X) is 3.43 times faster than basic 32-bit 
CRCs. The C program for bitwise implementation and s = 8 for this CRC is shown in Fig. 32. 

Case 2: s = 16 < h. First, consider bitwise implementation. Because 18 = ik > h — s = 16, the results 
of Section C.1.2.2 can be used in this case. Further, consider Case 1 of Section C.1.2.2, which requires the 
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existence of an m such that im+i > h — s > im- We have 17 = is > /i — s = 16 > 12 = 15. Thus, m = 2. 
Using (73), we have 

66 _ 4 + 5.5s _ 4 + 5.5x16 _ ^2 _ ^ 

~e ~ 6 + 5.5[ife - {h- s)] + k + m+l ^ 6 + 5.5 x (18 - 16) + 4 + 2 + 1 ~ 24 ~ 

Using (72), we have r = 5.5[zfe — {h — s)] + k + in + l = 18, which is substituted into (67) to show that the 
operation count per input byte required for computing the CRC check tuple under the bitwise new technique 
is e = 12. As shown below, by using a table of only 4 entries, e can be reduced to 8.5. 

We now discuss table-looup implementation for the CRC generated by F(X) = X^'^ + X^^ + X^'^ + X^^ + 
X^^ + 1. We can implement the table lookup for this CRC by imitating the table-lookup implementation 
presented in Section A.2.2 for the fast CRCs generated by Fh{X) = X^ + X"^ + X + 1. Using the new 
technique (15), we have 

B{X) = Rf^x) [A{X){X^^ + X^^ + + X^^ + 1)] 

= Rpix) [A{X){X'^ + X'')] + A{X){X^^ + X" + 1) 

We now decompose A{X) into 2 simpler polynomials Ai{X) and A2{X): 

A{X) = Ai{X)X^^ + A2{X) 

where degree(Ai(X)) < 2 and degree(A2(X)) < 14. We then have 

B{X) = Rf^x) [(Ai(X)Xi4 + ^2(^))(^'' + X'')] + AiX){X''' + X'^ + 1) 

= Rf^x) [{Ai{X)X^\X^^ + X")] + A2{X){X^^ + X'') + A{X){X''' + X^^ + 1) 
= Rf{x) [{A,{X)X'\X + 1)] + A2iX)iX'^ + X") + A(X)(Xi5 + x'" + 1) 
= T[Ai] + A2(X)iX^^ + X^^) + A{X){X^^ + X" + 1) 

where r[ ] is the table defined by 

T[Ai] = Rf^x) [(Ai(X)X31(X + 1)] 
where Ai is a 2-tuple. Thus, table r[ ] has 4 entries, which can be shown to be: 



T[0] 


= 












T[l] 






+ x'' 


+ x'^ 


+ x'^ 


+ 1 


T[2] 


= x'' 


+ x'' 






+ X + 


1 


r[3] 




+ x^^ 




+ x^'' 


+ x'' 


+ x 



In hexadecimal notation, we have T[l] = 8006c001, T[2] = 64003, and r[3] = 800rf8002. The C program, 

which includes the table of 4 entries, for this CRC is shown in Fig. 33. Recall from Appendix A that r denotes 
the operation count required for computing B{X). From Fig. 33, we have r = 11, which is substituted 
into (67) to show that the operation count per input byte required for computing the CRC check tuple under 
the new technique with table-lookup is e = 8.5 (as compared to e = 12 for the case of bitwise implementation). 
Case 3: s = 32 = h. Using (65), we have 

66 _ 3 + 5.5s _ 3 + 5.5x32 _ 179 _ ^ 

"e ~ 3 + 5.5(s - /i + ifc) + A; ~ 3 + 5.5 X 18 + 4 ~ 106 ~ 

Thus, under the new technique, using s = 32 is slower than using s = 8 and 16 for the CRC generated by 
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unsigned int 
{ 

int 

unsigned int 
s = 8; 
hs = 24; 



CRC32 (int n, unsigned char *Q) 

i) hs, s; 
A, B, P; 

/* hs = h-s, h = 32 */ 



P = 0; 

for (i=0; i<n; i=i+l) 
{ 

A = CP»hs) A Q[i]; 

B = CA«18) A (;a«17) a (A«15) a 

CA«14) A A; 
P = B A (:p«s); 
} 

return P; 
} 



/* 2 */ 

/* 2 */ 



/* 8 */ 
/* 2 */ 



Fig. 32 C program [s = 8, without table lool<up) for tlie 32-bit CRC generated by X^^ + X^^ + X^"^ 
(rfmin = 6 if total length < 32770 bits, and rfmin = 4 if total length < 65538 bits). 



X 



15 



X^^ + 1 



unsigned int 








CRC32_table C^nt n, unsigned short *Q) 








{ 








int i, hs, s; 








unsigned int A, Al, A2, B, P 








static unsigned int T[4] = 








{0x0, 0x8006c001, 0xb4003, 0x800d8002}; 








s = 16; 








hs = 16; /* hs = h-s, h 




32 


*/ 


P = 0; 

for (i=0; i<n; i=i+l) 


/* 


2 


*/ 


{ 








A = CP»hs) A Q[i]; 


/* 


2 


*/ 


Al = A » 14; 


/* 


1 


*/ 


A2 = A & 0x3fff ; 


/* 


1 


*/ 


B = T[A1] A (A2«18) A (A2«17) a 








(A«15) A CA«14) A A; 


/* 


9 


*/ 


P = B A CP«s); 


/* 


2 


*/ 


} 








return P; 








} 









Fig. 33 C program (s = 16, with 4-entry table lookup) for the 32-bit CRC generated 

by X^^ + + + + + 1 
(rfmin = 6 if total length < 32770 bits, and dmin > 4 if total length < 65538 bits). 

□ 

Example 6. Consider the 64-bit CRC generated by X^^ + X^ + X^ + X"^ + X + I. From Fig. 29, we have 
h > 10''' and h = 3.46 x 10^^. Thus, this CRC detects (a) up to 5 errors if its total length < lO''^ bits, and 
(b) up to 3 errors if its total length < 3.46 x 10^^ bits. Here, we have /i = 64, fc = 4, = 14 = 5. For 
implementation, we consider 4 cases: s = 8, 16, 32 and 64. 

Case 1: s = 8 < h. We have 5 = 14 = ik < h — s = 56. Thus, the results of Section C. 1.2.1 can be used 
in this case. From (70), we have 

66 _ 4 5.5s _ 4 -I- 5.5 X 8 _ 48 _ ^ 
~e ~ 6 + 2k ~ 6-1-2x4 ~ U~ 

Case 2: s = 16 < h. We have 5 = 14 = ik < h — s = 48. Thus, the results of Section C. 1.2.1 can be used 
in this case. From (70), we have 

eb _4: + 5.5s _ 4 5.5 x 16 _ 92 _ g 
'e ~ 6 + 2k ~ 6-1-2x4 ~ T4~ 
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Case 3: s = 32 < h. Wc have 5 = {4 = ik < h — s = 32. Thus, the results of Section C. 1.2.1 can be used 
in this case. From (70), we have 

eb _4 + 5.5s _ 4 + 5.5 X 32 _ 180 _ 
~ ^ 6 + 2k ^ 6 + 2x4 ~ U ~ ■ 



Case 4: s = 64. Because s > h, the results of Section C.1.1 can be used in this case. From (65), we have 

Cb _ 3 + 5.5s _ 3 + 5.5x64 _ 355 _ 

7 ~ 3 + 5.5(s - h + ik) + k ^ 3 + 5.5 X 5 + 4 ~ 345 ~ ' 

Thus, under the new technique, using s = 32 is much faster than using other values of s for the CRC 
generated hy X^"^ + + + X'^ + X + 1. □ 

Fig. 29 shows I2 and I4 for CRC generator polynomials that have weight 6. Fig. 34 shows I2 and I4 for 
CRC generator polynomials that have weights greater than 6. Although we have I2 > U, i.e., min(i2, ^4) = I4 
for all the CRCs in Fig. 29, this may not be true for all the CRCs in Fig. 34, e.g., I2 = 151 and I4 = 152 (i.e., 
h < h) for the CRC generated by + X^^ + X^^ + X^'^ + X^ + X^ + X + 1. Note that our search also 
produces the "CRC32sub8" and "CRC32subl6" polynomials presented in [19]: X^^ + X"^ + X'^ + X^ + X^ + 1 
(Fig. 29) and + x^^ + X^^ + X^° + X^ + X^ + X'^ + 1 (Fig. 34). Without using table lookup, the CRCs 
generated by other CRC32sub8 and CRC32subl6 polynomials in [19] can also be efficiently implemented by 
the fast technique (15) with s < 24 and s < 16, respectively. 

In Fig. 35, we present CRC generator polynomials of weight 5, i.e., fc = 3. These polynomials generate 
CRCs that have dmin = 5 if their total code length < min{^2, ^3, ^4} bits. Let us compare the CRCs in Fig. 35 
with those in Fig. 29. First, the largest values of I4 for ft, = 16 and 32 in Fig. 35 are almost twice of those 
in Fig. 29. Next, while Im = 00 for the CRCs in Fig. 29 for odd m, we have < 00 for those in Fig. 35. 
The CRC generator polynomials of odd weights greater than 5 are given in Fig. 36. In particular, I4 for 
^24 _^ ^14 ^ ^13 ^ j^i2 ^ ^11 + 1 in Fig. 36 is almost twice that for X^* + + X^^ + X'^ + 1 in 

Fig. 35. 

In Fig. 37, we present fast /i-bit CRCs that are generated by primitive polynomials of weight 5, i.e., 
F{X) = X'' + + 1. We have ^2 = 2'* - 1, because the polynomials are primitive. These CRCs 

have fast implementation when s is chosen such that s < h — i^ or is < h — s (see Section C. 1.2.1). Note that 
Fig. 37 includes some polynomials in Fig. 35, as well as the CRC-64-ISO polynomial X^'* + X^ + X'' + X + 1. 

Let us compare the polynomial X^"^ + X"^ + X^ + X"^ + 1 in Fig. 37 with the popular CRC-32-IEEE 802.3 
primitive polynomial (74). First, both these polynomials have the same maximum period I2 = 2^^ — 1. Using 
computer search, it can be shown that the CRC-32-IEEE 802.3 polynomial has I4 = 3006 and I3 = 91639 
[10], which are smaller than I4 = 5281 and I3 = 142741 for X32 + X^' + X^ + X^ + 1. Thus, the 32-bit CRC 
generated by X^^ + X*" + X^ + X^ + 1 is both faster and more effective (i.e., for patterns of 3 and 4 errors) 
than the CRC-32-IEEE 802.3. 

So far, we present polynomials that generate CRCs that have rfmin = 5 (see Figs. 35 and 36) and 
rfmin = 6 (see Figs. 29 and 34). Generator polynomials for CRCs that have rfmin > 6 can also be found. For 
example, Fig. 38 shows polynomials of weight 8, which generate CRCs that have dmin = 8 if their total code 
lengths < min{Z2, ^4, ^e} bits, because Im = 00 for odd m. Note that these same CRCs have dmin > 4 and 
c^min > 6 if their total code lengths < I2 and < min{Z2,^4} bits, respectively. Similar to the CRCs presented 
in Examples 1-6, many CRCs in Fig. 38 also have fast implementation. However, they are usually not as 
fast as the CRCs in those examples, because they are generated by polynomials that have greater weights 
(see Figs. 29 and 38). 

There are also many other CRCs, which are not presented here, that can be efficiently implemented by 
the fast technique (15). For example. Fig. 38 shows the 32-bit CRC generated by X^'^ + X^^ + X^^ + X^° + 
X^ + X^ + X + 1, which has k = 301, I4 = 3298 and h = 2147483644 « 2=^^ - 1. Not shown in Fig. 38 is 
an alternative 32-bit CRC generated by X^^ + X^^ + X^^ + X" + X^ + X'"^ + X^ + 1, which has k = 255, 
I4 = 3509, and h = 2147483647 = 2^1 - 1. These 2 CRCs have almost identical I2, but have different k and 
I4. They can also be efficiently implemented by the fast technique (15) with s = 16 or s = 8. 
Remark 16. As shown in Fig. 29, the 32-bit CRC generated by X^^ + X'* + X'"^ + X^ + X + 1 has I4 = 33, 
i.e., this CRC fails to detect at least one pattern of 4 errors when its total length exceeds 33 bits. Similarly, 
the 64-bit CRC generated by X*^^ + X^ + X^ + X^ + X + 1 has ^4 = 65. Note that these polynomials 
consist of X^ and consecutive powers of X. We now show in general that such polynomials have I4 = h + 1. 
Thus, consider the /i-bit CRC generated by F{X) = X'' + X" + X"-i + • • • + X-^ + X^ + X + 1, which has 
weight n + 2 and consists of X^ and consecutive powers of X, namely, X", X"~^, . . . , X^, X^, X, 1. Here, 
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we assume that 3 < n < h — 2. We have l-zJ-SjU > h + 1 because F{X) has weight greater than 4. The 
4-error pattern E{X) = + X'' + + 1 is a multiple of F{X) because F{X){X + 1) = E{X), which 

implies that I4 < h + 1. Thus, I4 = h + 1. Note that Zi = 00 because F{X) is not a multiple of X. Thus, 
mm{li,l2,h,l4:) = 14 = h+1. 

Case 1: n is even. Then li = = = 00 because F{X) has even weight. The ft- bit CRC generated by 
F{X) then has minimum distance dmin = 6 when its total length = min(Zi, /2, ^3, ^4, /s) = min(Z2, ^4) = Z4 = 
/i + 1 bits. This CRC has dmin > 4 when its total length < min(Zi, ^2, ^3) = h bits. Recall that I2 is also the 
period of F{X). 

Case 2: n is odd. The ft-bit CRC generated by F{X) then has (i,„in = 5 when its total length 
= min(Zi, I2, 13, 14) — I4 ^ h + 1 bits. This CRC has (imin > 3 when its total length < min(Zi, I2) = h bits. □ 

Remark 17. Consider a CRC that is generated by F{X) and has total length of I bits. When a CRC 
codeword of length I is transmitted, it is affected by errors. Let us focus on the patterns of m errors. There 

are (^'J such patterns of m errors. A straightforward technique to show that the CRC detects all the m-error 

patterns is to verify that it detects each of the (^) m-error patterns. However, we show below that, to verify 
that this CRC detects all the m-error patterns, it is sufficient to verify that it detects all the error patterns 
from a subset of only patterns of m errors. First, let E{X) be an error polynomial. We can write 

E{X) = X''E*{X) 

where a > and E*{X) is a polynomial whose least significant bit is 1, i.e., E*(0) = 1. For example, let 
E{X) =X^ + We then have a = 2 and E*{X) =X^ + 1, because E{X) = X^{X^ + 1). 

Next, we show that E{X) is undetected iff E*{X) is undetected. Thus, suppose that the error pattern 
E{X) is undetected, i.e., it is a codeword of the CRC generated by F{X). We then have E{X) = K{X)F{X) 
for some polynomial K{X), which implies X'^E*{X) = K{X)F{X). Because we assume that F{X) is 
not a multiple of X, i.e., gcd(X, = 1, we must have E*{X) = K*{X)F{X) for some polynomial 

K*{X). Thus, E*{X) is also a codeword, i.e., it is an undetected error pattern. Suppose now that E*{X) is 
undetected, i.e., E*{X) is a multiple of F{X). Then E{X) = X'^E*{X) must also be imdctected, because 
E{X) is also a multiple of F{X). To summarize, E{X) is undetected iff E*{X) is undetected. This fact can 
be used to speed up the search for CRCs that can detect specified sets of error patterns. 

Let A be a set of error patterns. As seen above, each E[X) G A can be written as E{X) = X°'E*{X), 
for some a > and E*{Q) = 1. Let A* be the set of all such E*{X), i.e., 

A* = {E*{X) : E*{Q) = 1,X''E*{X) e A for some a > 0} 

We must have [A*| < |A|. However, in general, it is not necessarily that A* C A. In particular, consider 
a CRC having total length of I bits, and let A be the set of all patterns of m errors. We then have 
|A| = (^) = 0(1"^). Because A* is the set of all patterns of m errors, under the restriction that the least 
significant bit of each of these error patterns is 1 [i.e., £'*(0) = 1], we must have |A*| = = ©(Z™^^). 

A straightforward technique to show that a CRC of length / will detect all the m-error patterns is to verify 
that each m-error pattern in A is not a multiple of the CRC generator polynomial [10]. More specifically, 
for each m-error pattern 

E{X) = + X""'-^ + ■ ■ ■ + X"' + X"" 

in A, we compute Rf(x) [E{X)] = EIlo^ ^f{x) [X"']- The error E{X) is undetected iff Rf(x) [E{X)] = 0. 
The computation can also be implemented by table lookup [3]. We then have 

rn— 1 

R^(^) [E{X)] = J2 TWi] 

i=0 

where the table T[ ] is defined by T[a] = Rf(x) i^"']- Overall, this brute- force technique has computational 

complexity (jj ^ 0(Z™). 

As seen above, E{X) G A is undetected iff E*{X) G A* is undetected. Recah that |A| = (j^) and 
I A* I = Thus, an alternative technique to show that a CRC of length / will detect all the m-error 

patterns is to verify that each m-error pattern in A* is not a multiple of the CRC generator polynomial. 
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This alternative technique has computational complexity ( ' = 0(Z™ and is faster than the brute-force 
technique by the factor =l/m = 0{l). □ 



Remark 18. Here, we explain how the CRC generator polynomials shown in Figs. 29 and 34-39 are 
found. We restrict our search of /i-bit CRC generator polynomials to a subset of polynomials S = 
{Fo{X), Fi{X), . . . ,Fn--i{X)}, for some n, where each Fj{X) is a polynomial of the form (86), i.e., it 

has degree h and weight k + 2 (i.e., Fj{X) has k + 2 terms). For example, if we restrict S to be the set 

of polynomials of degree h and weight 6 (i.e., k = 4), we then have n = |S| = C^^^)- Note that each 
polynomial Fj{X) can also be represented by a binary integer Fj whose digits are the coefficients of Fj{X), 
i.e., Fj = Fj{X). The polynomials in S are arranged in increasing order, i.e., Fj^ < Fj^ when ji < j2. 

Consider a CRC generated by Fj{X) that has the form (86). Because any undetected error miist be 
a multiple of Fj{X), it follows that h > h for all i > 1. In particular, lk+2 = h, because the error 
E{X) = Fj{X), which has weight k + 2 and length /i + 1, is undetected. We also have h = oo. Recall that 
I2 is also the period of the polynomial. Here, we are only interested in Zm for 2 < m < + 1. For example, 
when = 3, i.e., Fj(X) is a polynomial of weight 5, we are only interested in hjh, and Z4 (sec Fig. 35). 
When k is even, i.e., Fj{X) is a polynomial of even weight, we have Im = 00 for odd m. In this case, we 
are only interested in Ijyi for even ttz, i.e., for vfi — 2, 4, . . . , A;. For example, when k — 4, i.e., Fj(^X^ is a 
polynomial of weight 6, we are only interested in I2 and I4 (see Fig. 29). 

(a) Consider Fig. 29, which shows polynomials of weight 6, i.e., fc = 4 and Fj{X) = X^ + X'* + X^^ + 
+ + 1. Here, we show I2 and l^. Each Fj{X) generates a CRC that has minimum distance dmin — 6 

when its total length < mm.{l2,li} bits. The polynomials are shown in increasing values of their binary 
representation and increasing Z4, i.e., the Z4 of Fj^{X) is smaller than the Z4 of Fj^{X) for ji < j2. Although 
these CRCs can be implemented by the familiar basic technique, they can be much faster implemented by 
the new technique (15). Using the new technique, an /i-bit CRC with smaller I4 is at least as fast as an /i-bit 
CRC with larger I4. Thus, as expected, there are tradeoffs between code capability and speed, i.e., CRCs 
with smaller I4 is faster than CRCs with larger I4. 

(b) As seen in Fig. 29, where we impose the condition fc = 4 (i.e., the generator polynomials have weight 
exactly 6), the generator polynomial X^^ + X^"^ + + X^ + X'^ + 1 yields the largest I4 = 130 for the 
case of ft, = 16. Can I4 be improved if we allow fc > 4? Consider Fig. 34, which shows the CRC generator 
polynomials with even k and fc > 4, i.e., each Fj{X) is a polynomial of even weight greater than 6. Our 
purpose here is to find out if there are other CRCs that have values of I4 that are larger than those in Fig. 29. 
Such a CRC generator exists for the case h = 16, namely, X'^^ + X^^ + x'^^ + X'^^ + X^ + X'^ + X + 1, with 
I4 = 152, which is larger than the largest I4 = 130 in Fig. 29. Note that, using the new technique (15), the 
CRCs in Fig. 29 are usually faster than those in Fig. 34, because they are generated by polynomials that 
have lower weights. 



(c) Figs. 35 and 36 show generator polynomials that have odd weights and generate CRCs that can detect 
1, 2, 3, and 4 errors. We now show I2, 13 and I4. In Fig. 35, we require k = 3 (i.e., the generator polynomials 
have weight exactly 5). In Fig. 36, we require that fc is odd and fc > 3 (i.e., the generator polynomials have 
odd weights greater than 5). 

(d) Figs. 38 and 39 show generator polynomials that have even weights and generate CRCs that can detect 
1, 2, 3, 4, 5, 6, and 7 errors. We now show l2,l4 and Iq. In Fig. 38, we require fc = 6 (i.e., the generator 
polynomials have weight exactly 8). In Fig. 39, we require that k is even and fc > 6 (i.e., the generator 
polynomials have even weights greater than 8). □ 
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CRC generator polynomial 


k 


I2 = period 


h — 1 

period 




X'* + X'^ + X2 + X + 1 


17 


30705 


1.06716 


X^^> + X5 + 


X + X + X + X + 1 


104 


3066 


10.6872 


+ X9 4 




X^ + X^ 4- X"* + X^ + 1 


128 


254 


129.004 




+ x" 


+ X^" 4- X^ + X^ + X* + 1 


130 


258 


127.004 




+ Xl2 


4 X " + X^ + X + X + 1 


152 


151 


217 




X'* + X^ + X2 4 X + 1 


25 


3145722 


2.66667 




X5 + 


X'* + X^ + X2 + X + 1 


231 


2796202 


3 




XS + 


X'* + X^ + X2 + X + 1 


243 


32385 


259.028 




X"'4-X'^+X^4X +1 


388 


3276 


2560.62 


+ + X*' + 


X + X + X^ + X + 1 


453 


1040130 


8.06496 




X'' + X^+X^+A4-l 


499 


8388607 


1 


js:24 + X9 4 


X8 + 


X^ + X^ 4- X^ + X 4- 1 


558 


2046 


4100 


X24 + 


+ X8 - 


- X® 4- X^ 4- X* + X2 + 1 


615 


8126433 


1.03226 


+ Xl" 


+ X9- 


- X + X + X + X + 1 


673 


8388604 


1 




+ x«- 


- X** + X*' + X''' + X'^ + X'"* + X + 1 


831 


32767 


256.008 




+ x" 


+ X'^ + X^ + X'' + X2 + 1 


2048 


4094 


2049 




+ xi« 


+ X-'* + X-"' 4 X** + X^ + 1 


2050 


4098 


2047 




X'* + X^ + X2 4 X + 1 


33 


2139094785 


1.00392 


;Sf32 _,_ J(^7 _,_ _,_ 


X'* + X'^ + X2 + X + 1 


1251 


38337390 


56.0154 




X'* + x-^ + X2 + X + 1 


1442 


66060162 


32.508 


+ X'' 4 


x^ + 


X^ + X"* 4- X2 4- X + 1 


4017 


2147483647 


1 


X32 + X" 


■f X9 - 


f X^ 4- X^ 4- X2 + X + 1 


4063 


2130706305 


1.00787 


X32+X11 


+ xi" 


+ X + X + X + X + X + X 4- 1 


4085 


2147483647 


1 


X32+X1-' 


^x*^ - 


- X'' + X* + X'^ + X + 1 


4241 


28703892 


74.8151 


X32 +X12 


+ xi" 


+ A +A +A +A +i 


4400 


1879048185 


1.14286 


X-^2 +Xl2 


+ x" 


+ X" + X** + X^ + X« + X"' + X3 + 1 


5012 


114681 


18725.7 


x:« +X13 


+ Xl2 


+ X** + X6 + X'' + X + 1 


5240 


102261126 


21 


^32 _^^13 


+ Xl2 


+ Xi" + X« + X^ + X'' + 1 


8222 


253921 


8457.29 


^32 _^^17 


+ Xi" 


+ Xi3 + xi" + X'' 4 X"* + X2 + X + 1 


8224 


253983 


8455.23 


X32+X18 


+ X14 


+ Xi3 + Xi2 + x" 4- X9 4- + X4 + 1 


16384 


32766 


65540 


X32 + Xl8 


+ Xl6 


+ Xi2 + X" + Xio 4- X* 4- + XS + X2 + X + 1 


32768 


65534 


32769 


X64 + X7 4 


x*^ + 


X^ + X3 + X2 + X + 1 


> 10"' 


1.92 X 10^2 


4.80 X 10** 


^64 + 4 


X8 + 


X'^ + X^ + X^ 4 4 ^2 4 X + 1 


> 10^ 


7.20 X lO^'* 


128.03126 



Fig. 34 CRC generator polynomials of even weights greater than 6. 





gt!iu!ra 


or 


)(.)iviiuiiiial 


/4 


/;', 


I2 = pc'iiod 


V. .1 
period 


Xl6 


4X3 + 


X2 


4X + 1 


17 


351 


57337 


1.14298 


Xl6 


4X^4 


X2 


4X + 1 


31 


121 


16383 


4.00018 


Xl6 


4X4 + 


X3 


4X41 


63 


235 


59055 


1.10973 


Xl6 


+ X" 4 


X2 


4X + 1 


68 


230 


57337 


1.14298 


Xl6 


4X5 4 


X3 


4X + 1 


104 


683 


21845 


3 


Xl6 


4X5 4 


X4 


4x241 


116 


121 


57337 


1.14298 


Xl6 


4X10- 


f X5 4 X3 4 1 


126 


317 


65535 


1 


X"' 


4- x " - 


fX 


* 4 X3 4 1 


1.30 


'OO 


381 


172.008 




-^x^ - ' - 


fX^ 4X-'' 4 1 


258 




2.j7 


2.j5 


X24 


4X3 4 


X2 


4X + 1 


25 


00 


4095 


4097 


X24 


4X^4 


X2 


4X4-1 


47 


7399 


5586603 


3.00312 


X24 


4X*4 


X3 


4X41 


533 


5839 


16777215 


1 


X24 


4X5 4 


X2 


4X41 


725 


1778 


5586603 


3.00312 


X24 


4X«' 4 


X2 


4X + 1 


841 


5531 


4194303 


4 


X24 


4X17- 


f X 


12_,_X7 + 1 


2050 


00 


6141 


2732 


X32 


4X3 4 


X2 


4X + 1 


33 


351 


469762041 


9.14286 


X32 


4 X* 4- X2 


4-X4-1 


63 


15873 


268435455 


16 


X32 


4 X* 4- X3 4- X + 1 


2250 


> 10^ 


77302995 


55.5602 


X32 


4X5 4 


X2 


4X41 


4345 


45868 


147436713 


29.1309 


X32 


4X^4 


X6 


4x241 


5281 


142741 


4294967295 


1 


X32 


4X21 _ 


f X 


16+xll+l 


65538 


00 


65537 


65535 


X64 


4X3 4X2 


4X-hl 


65 


> 10^ 


1.01 X IQi* 


18.2879 


X64 


4 X* 4- X3 


4-X-H 


> 10^ 


> 10^ 


264 _ 1 


1 



Fig. 35 CRC generator polynomials of weight 5. 
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CRC generator polynomial 
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h 


I2 = period 


2"-l 
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A^" - 


\- X^ -\ 


- X + X' + X + 


X + 1 






17 
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65535 


1 


■vl6 


\-X^ -\ 


- X + X^ + X + 
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95 


oo 
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_ 




- X^ +X"' + X^ 4- 
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96 
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1.03275 


- 


l-X''' -f 


- X^ + X* + X3 + 
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97 
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1 


■v-16 


F X" 4 


- X + X + X + 
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21845 


3 


X^^ - 




- X' + X*' + X** + 


X2 + 1 
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4369 


15 


X^^ ' 




^7' Y 1 "XT' 4 1 \7" '-^ 1 

- X ' + x^ + x^ + 
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oo 


381 
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vl6 

A^" - 


- X 
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oo 
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- X'-^ 
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+ X* + 1 
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oo 


257 
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- 


1- -f 
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25 
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■v24 


1- A° -f 


-X*+A'^+X^ + 


X + 1 
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4317 
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3.14631 


A'^^ - 


\-X^ -\ 


- X° + X + x^ + 


X + 1 






746 


2254 


29127 


576.002 


■v-24 
X^^ - 


-f 


- X® + x^ + x^ + 


X + 1 






788 


8703 


16777215 


1 


■v-24 


- X 


+ X + X + x° - 


- X^ + X + 


X 4- 


1 
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5687 


5586603 


3.00312 


■v-24 


- A 


+ X + X + x° - 


- X° + X + 


x^ 


■f 1 


901 


10751 


5592405 


3 


X^^ - 


x^l2 
- X 


+ X*^ + X^ + X'' - 


f X" + X3 + 


X 


+ 1 


919 


6297 


13762455 


1.21906 


■v24 
A^^ - 


- X 


+ X + X + X 


+ X + X ' 


+ X 


+ X^ + X + 1 


2050 


oo 


6141 


2732 


-v24 


- X 


+ X + X + X 


+ X + 1 






4096 


oo 


4095 


4097 


■v24 
A^^ - 


- X 


+ X + X + X 


_|_ _|_ I 






4098 


oo 


4097 


4095 


X-^^ - 


|-X5 -f 


- X** + X3 + X2 + 


X + 1 






33 


> 10« 


44695211 


96.0946 


X"^^ - 


<rX^ -\ 


- X* + X3 + x^ + 


X + 1 






2295 


202045 


94972251 


45.2234 


vS2 
X^'^ - 


l-XO -f 


- X° + X* + x^ + 


X + 1 






3103 


96097 


4286578177 


1.00196 


X32. 




- X" + X'' + X2 + 


X + 1 






3831 


220463 


1073741823 


4 




hXM 


-X" +X4 +x'' + 


X + 1 






3960 


92515 


1073741823 


4 


X32. 


FXM 


-Xl^ +X4 +X2 + 


X + 1 






3972 


38335 


3758096377 


1.14286 


X32_ 




-X'' +X5 +X'' + 


X3 + 1 






4380 


32768 


153391689 


28 


X32- 




- X^ + X"' + x" + 


X2 + 1 






5345 


115188 
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1.00049 


X32- 


l-X" 


+ X^ + X"' + x-^ - 


hX + l 






5617 


141304 


107374182 


40 


X32- 


I-X12 


+ X^ + X'' + x^ - 


f X3 + 1 






5820 


27707 


402653181 


10.6667 


X32- 


I-X12 


+ X11 +X9 +X7 


+ X3+X2- 


i-x- 


f 1 


65536 


oo 


65535 


65537 


^64 + x6 + + X3 + X2 + X + 1 






65 


> 10« 


1.79 X IOI8 


1.0323 




|-X6 -t 


- X* + X3 + X2 + X + 1 






> 10^ 


> 10^ 


240 _ 1 


1.68 X lO'' 



Fig. 36 CRC generator polynomials of odd weights greater than 5. 
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CRC generator polynomial 
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. 37 Primitive CRC generator polynomials of weight 
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CRC generator polynomial 
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8 
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X32 


+ Xi4 
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3.42857 


X32 


+ Xl4 


+ Xi3 


+ x 


' + X-"' + X"' + X + 1 
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2347 
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X32 


+ Xl6 


+ Xi5 


4-X 


'" +X« +X^ + X + 1 
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3298 


2147483644 


1 


X32 


+ X2" 


+ xi» 


+ x 


15 + j^l2 _^ 


Xis 4- X"' + 1 
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287460210 
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X32 


+ X21 


+ xw 


+ x 


'»+X'7 + 


Xi2 4- 1 
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2146433025 
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X32 
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4-X19 


4-X 
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X^ 4- X3 4- 1 


320 


2711 
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Fig. 38 CRC generator polynomials of weight 8. 
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CRC generator polynomial 


l6 


k 


h = period 


2'''-'--l 
period 


1^ 1 fi 


- X^ ~ 


fX7 + 


X^ + X5 + X'' + X^ + X2 + X + 1 


18 


17 


584 
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A^" - 


- X^^' 


+ X8- 
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18 


45 
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22 


55 
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- 
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29 


67 
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30 


36 


63 
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31 
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31 
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26 


25 


95480 
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26 
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1 
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65 
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88 
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90 
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+ Xio 
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92 
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+ xi" 
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93 
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1.152 
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+ X9 + X** + X^ + X*' + X"" + X^ + X2 + X + 1 


97 


223 
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1 
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34 


33 
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34 
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8 
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f x'^ + xfs + X5 + X"" + X2 + X + 1 
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l-X" 


+ Xio 


+ X'^ + X^* + X5 + X^ + X-' + X + 1 
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X32- 


l-X" 


+ xi" 


+ X9 + X8 + X ' + X'' + X* + X2 + 1 


259 
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48.3781 
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+ x" 
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l-X" 
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I-X14 


+ X12 
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3249 
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1.01587 


X32 + XlS 


+ Xl3 


+ Xi2 + + X* + X'' + X6 + xs + 1 
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237198535 
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X32 


+ x" 


+ X" + Xi2 + + X* + X'' + X6 + X5 + X2 + 1 
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2314 


2113929153 


1.01587 



Fig. 39 CRC generator polynomials of even weights greater than 8. 



APPENDIX D CRC WEIGHT DISTRIBUTIONS 

We now briefly present the computation of the weight distributions of CRCs, which are used for computing 
the undetected error probability of CRCs over binary symmetric channels (BSCs). A BSC is specified by 
the requirement Pr(0[l) = Pr(l|0), where Pr(j[?') is the conditional probability that bit j is received when 
bit i is transmitted. The value p = Pr(0|l) = Pr(l|0) is called the transition probability of the BSC. 

Given a code of length I, the sequence {wo, wi, . . . ,wi) is called the weight distribution of the code, where 
Wm is the number of codewords of weight m. Note that Wm = when m < where dmin is the minimiim 
distance of the code. The determination of the weight distribution of a code in general is an NP-hard 
problem [8]. The undetected error probability p„ of a code over a BSC with transition probability p is given 
by [8, 12] 

I I 

p„ = ^ w™(i-py-'"= E w'mp^a-p)'"" (87) 

In the following we present CRC weight distributions obtained by computer search. Mathematical studies 
of the weight distributions and the undetected error probability of codes arc presented in [8] . 

Consider a CRC that has length I and is generated by a polynomial F{X), which is not a multiple of X. 
A polynomial E{X) is a codeword of this CRC if E{X) 'is a multiple of F{X), i.e., Rf{x) [E{X)] = 0. This 
fact can be used to compute CRC weight distributions. Note that wq = 1- li F{X) has even weight, then a 
polynomial of odd weight can not be a codeword, i.e., Wm = for odd m. 

For < m < Z, let be the set of polynomials of degrees < I and weight m, i.e.. 

Am = 1 + = + • • • + + X"" : / > Um-l > am-2 > • • ■ > tti > > 0} (88) 

We have lA^I = (^)- Let be the set of CRC codewords in A^, i.e., 

Bm = {E{X) G Am : Rf{x) [E{X)] = 0} (89) 
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We have Wm = |Bm|. Thus, a direct technique for computing w„i is to count the number of polynomials 
in Am that are multiples of F{X) (cf. [3]). Because \Am\ = (^) = 0(1'^), this direct technique has 
computational complexity 0(|Am|) — 0(Z™) (cf. [3]). A faster technique for computing w„i, which has 
computational complexity 0(/™^^), is presented in Remark 19. Using this faster technique, we obtain the 
values of Wm for /i-bit CRCs as shown in Figs. 40-43 for m = 4,6,8, and h = 8,16,24,32,64. Note that 
Wm = for odd m and W2 = 0, because the generator polynomials in these figures have weight 4 (i.e., even 
weight) and minimum distance d = 4 at the indicated code lengths. 

Fig. 40 shows W4, wq, and wg for the fast 8-bit CRC generated by + X"^ + X + \ (which is also the 
ATM CRC-8) and for the 8-bit CRC generated hy X^ + X^ + X"^ + 1 (used in 1-Wire bus). The results show 
that these 2 CRCs have similar w^, wq, and ws- 

Fig. 41 shows W4 for the fast 16-bit CRC generated hy X^^ + X'^ + X + 1, the CRC-CCITT generated 

by + + X^ + 1, and the CRC-16 generated by X'^^ + X^^ + X^ + 1. The rcsuhs show that (a) for 
I < 1000, W4 is smallest for the CRC-CCITT, largest for CRC-16, and in-between for the fast 16-bit CRC, 
and (b) for / > 2000, all 3 CRCs have similar ^4. 

Fig. 42 shows that W4, for the fast 32-bit CRC generated by X^'^ + X'^ + X + 1 is larger than Wi for the 
32-bit CRC generated by X^"^ + X^^ -|- -|- 1 (which is proposed in [5]). 

Fig. 43 shows that, for / > 200, ^4 for the fast 64-bit CRC generated by X^"^ + X'^ + X + 1 is smaller 
than W4 for the 64-bit CRC generated by X'^^ + X^^ +X'^ + 1 (which is proposed in [5]). 

We now consider a first-order estimate for the undetected error probability by assuming that the first 
term of (87), ■w;d„i„p'^'"'"(l — p)'""*"™, is much larger than all the other terms. A simple estimate, which is 
reasonable when Ip « 1, for the undetected error probability for a BSC is then 



Pu « Wd^inP 



(90) 



For example, suppose that the fast 32-bit CRC generated by X^"^ + X^ + X + 1 is used to protect a 
3000-bit codeword over a BSC with transition probability p. Because (imin = 4 and I = 3000, Fig. 42 yields 
Wd„,i„ = «'4 = 1.855 X 10'"'. Using (90), we then have pu ~ 1.855 x lO^p^. In particular, pu ~ 1.855 x 10"^^ 
when p — 10^^, and Pu ~ 1.855 x 10~^^ when p — 10^^. The undetected error probability p„ for the BSC 
can be greatly further reduced by using CRCs with dmin > 4. These CRCs are presented in Section C.4, 
e.g., Fig. 29 shows many generator polynomials for 32-bit CRCs that have dmin = 6 when I < I4. Although 
these CRCs can be efficiently implemented by the fast technique (15), they are not as fast as the fast CRC 
generated by X^"^ + X"^ + A + 1. Note that the undetected error probability Pu given in (87) and (90) are 
for BSCs, and may not be valid for other types of channels. 
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X + 1 




+ 


+ 1 
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W6 


W8 


W4 




W8 


10 


3.000e-|-00 


O.OOOe-l-00 


O.OOOe+00 


2.000e-|-00 


l.OOOe-l-00 


O.OOOe-l-00 


20 


3.900e-|-01 


2.870e-|-02 


1.029e-|-03 


4.300e-|-01 


2.820e-|-02 


l.Olle-l-03 


50 


1.833e-|-03 


1.241e-|-05 


4.195e-|-06 


1.813e-|-03 


1.244e-|-05 


4.192e-|-06 


100 


3.136C+04 


9.304e-|-06 


1.454C+09 


3.135C+04 


9.305C+06 


1.454C+09 
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8.268e+04 


4.035e-|-07 


1.047e-|-10 


8.268C+04 


4.035e-|-07 


1.047C+10 



Fig. 40 W4, We, and Wg for the CRCs generated by A^ + A^ + A -|- 1 and A^ + A"' + A" + 1 
{Wm = number of codewords of weight m). 
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X^ii + X'^ +X + 1 


Xiii j^x^ + 1 


jfl6 _,_^15 j^x'^ +1 
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1.289C+03 
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3.836C+03 
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1.345C+04 


9.478e+03 


2.826C+04 
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3.839e+04 
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6.960e+04 
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8.656e+04 


7.587e+04 


1.326e+05 
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2.568e+05 
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5.177C+05 


6.927C+05 


900 
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8.344C+05 
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1.276e+06 


1.473C+06 
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2.085e+07 
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1.030e+08 


1.036e+08 
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3.254e+08 


3.252e+08 


3.256e+08 
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7.940e+08 


7.938e+08 


7.943e+08 
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1.646e+09 


1.646e+09 


1.647e+09 
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3.050e+09 


3.050e+09 


3.050e+09 


8000 


5.204C+09 


5.204C+09 


5.204C+09 


9000 


8.336C+09 


8.337C+09 


8.336C+09 
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1.271C+10 


1.271C+10 


1.271C+10 


11000 


1.861C+10 


1.861C+10 


1.861C+10 
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2.635C+10 


2.635C+10 


2.635e+10 
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3.630e+10 


3.630e+10 


3.630e+10 
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4.883e+10 


4.883e+10 


4.883e+10 


15000 
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6.435e+10 


6.435e+10 
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2.034e+ll 


2.034e+ll 


2.034e+ll 



Fig. 41 W4 for the CRCs generated by X^^ + X"^ + X + 1, X^^ + X^'^ + X^ + 1. and X^^ + X^^ + + 1. 





XS2 _^_x2 +X + 1 


XS2 _,_ x3i ^x^ + 1 


I 


W4, 


W4, 


100 


2.820e+02 


1.040e+02 


200 


1.276e+03 


4.560e+02 


300 


2.648C+03 


1.016C+03 


400 


4.264C+03 


1.816C+03 


500 


5.964C+03 


2.636C+03 


600 


8.245C+03 


3.736C+03 


700 


1.064C+04 


4.936C+03 


800 


1.308e+04 


6.136e+03 


900 


1.558e+04 


7.336e+03 


1000 
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4000 
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5000 
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6000 
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2.007e+05 
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8000 
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9000 
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12000 


3.086C+06 
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3.668C+06 
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14000 


4.291C+06 


1.332C+06 


15000 


4.976e+06 


1.625e+06 


20000 


1.017e+07 


4.026e+06 



Fig. 42 W4 for the CRCs generated by X^"^ + + X + 1 and ^ ^31 + 
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Xfi-i +X + 1 


X(i4, _,_ jsf 63 _^_x'2 + 1 
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100 


7.100C+01 


3.600C+01 


200 


5.660C+02 


5.720C+02 


300 


1.440C+03 


2.200C+03 


400 


2.556e+03 


4.781e+03 


500 


3.756e+03 


7.939e+03 


600 


5.304e+03 


1.252e+04 


700 


6.904e+03 


1.771e+04 


800 


8.536e+03 


2.328C+04 


900 


1.024e+04 


2.920C+04 


1000 


1.194C+04 


3.525C+04 


2000 


3.330C+04 


1.266e+05 


3000 


6.188C+04 


2.659e+05 


4000 


9.181e+04 


4.253e+05 


5000 


1.509e+05 


8.531e+05 


6000 


2.139e+05 


1.346e+06 


7000 


2.786e+05 


1.870e+06 


8000 


3.436C+05 


2.408C+06 


9000 


4.449C+05 


3.385C+06 


10000 


5.569C+05 


4.468C+06 


11000 


6.689C+05 


5.573C+06 


12000 


7.809e+05 


6.685C+06 


13000 


8.973e+05 


8.019e+06 


14000 


1.015e+06 


9.381e+06 


15000 


1.133e+06 


1.075e+07 
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1.901e+06 


2.034e+07 



Fig. 43 W4 for the CRCs generated by X^* + + X + 1 and X^^ + X^^ + + 1. 

Remark 19. Recall from (88) that A„j is the set of polynomials of degrees < I and weight m. Let 

be the subset of polynomials in A-Yn whose lowest-order terms are 1, i.e., X^^ — 1 or cto — 0. Thus, each 

member of has the form 

E*{X) = X"— 1 + X"-^-^ + • • ■ + + 1 
where I > a^-i > am-2 > • • • > ai > 1. We have E*{0) = 1 because X"*" = 1. Thus, 

A^ = {E*{X) e A„ : E*iO) = 1} 

We have jAJ^^I = Let B^^ be the set of CRC codewords in AJ^^, i.e., 

b;; = {£*(x) G a;, : r^(;,) [^*(x)] = 0} 

It can be shown that B*^ = {E*{X) e B„ : £;*(0) = 1}, where B„ as defined in (89) is the set of CRC 
codewords in Am. We then have BJ^ C AJ^ C Am and BJ^ C B^ C A^. 
For each E*{X) € B;;,, define 

Ce.(x) = {™*(^) : a = 0,l,...,/-degree(i;*(X))-l} 

We have \Ce-(x)\ = I - degiee{E* (X)) . 

Let E{X) G Bra- We then have E{X) = X"'E*{X), for some a > and some E*{X) G B;;,, because the 
generator polynomial F{X) is not a multiple of X, i.e., F(0) = 1. We have < a < Z — degree(£*(X)) — 1, 
because / — 1 > degree(£J(X)) = a + degree(£J*(X)). We then have 

Bto C Ce'-(x) 
Because Ce^^x) C B^ for each E*{X) G B^, we have 

E'{X)eB'^ 
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Thus, 

= [J Ce*(x) 

Because E*{X) is not a multiple of X, it can be shown that Ce'(x) n Ce*{x)' = when E*{X) ^ E*{Xy. 
That is, {Cb.(x) : E*{X) G B;^} is a partition of B„. Thus, 

l^"*! = |C£;*(x)| 
E*(X)eB;^ 

Because Wm = \^m\ and \Ce-(x)\ = l — degree(£'*(X)), we have 

wm= [l-degvee{E*{X))] 
E*(x)eB;;, 

Thus, Wm is computed by adding the numbers — degrcc(£;*(X))] for all polynomials E*{X) £ B*„. 
Because the polynomials E*{X) are those of that are multiples of F{X), they can be found in 0(|A*„|) 
steps. Finally, Wm can be computed in 0(Z™~^) steps, because |A^| = = 0(Z'"~^). □ 



APPENDIX E CRC PARALLEL IMPLEMENTATION 

Given a CRC, which is generated by a polynomial M{X) of degree /i, our goal is to compute the check 
/i-tuple P{X) to protect an input message U{X) = [Qq{X), . . . ,Qn-i{^))-, where Qi{X) is an s-tuple. 

So far, it is implicitly assumed that the CRC algorithms are for sequential implementation. That is, the 
entire input message U {X) is supplied to a single processor of a computer, and the output P{X) is then 
computed by this same processor. Following the technique in [7], we can modify these CRC algorithms for 
parallel implementation on k different processors of a computer, fc > 1, as follows. 

First, the input message U{X) is divided into k sub-messages Eq[X), . . . , Ek-i{X), i.e., 

U{X) = {Eo{X),...,Ek-i{X)) 
where Ei{X) consists of rii s-tuples. Thus, n = no H + nk-i- Define 

j^(ni+iH \-nk-i)s ^Q-^-^ 

for < i < fc - 2, and Wk-i{X) = 1. Note that W^{X) is computed from which is used 

to determine the relative position of sub-message Ei{X) in U{X) (see Remark 20). 

Next, for each j = 0,l,...,fc — 1, input sub-message Ei{X) is supplied to processor i, which is used to 
compute the following /i-tuples: 

Pi{X) = Rm[x) [Ei{X)X^] (92) 



W,{X) = Rm(x) 



Z,{X) = Rm{x) mx)W,{X)] (93) 

where Wi{X) is defined by (91). Note that Pi{X) is the CRC check tuple computed by processor i for sub- 
message Ei{X). For each i = 0, 1, . . . , A; — 1, we assume that processor i computes Pi{X) and Zi{X) in (92) 
and (93), independent of other processors, i.e., the computation is done in parallel by the k processors. 

Theorem 5. The tuples Zi{X), <i < k, which are computed in parallel by the k processors, are combined 
to yield the final CRC check /i-tuple P{X) for the entire input message U{X), i.e., 

k—l 

P{X) = Y,Zi{X) (94) 

i=0 



Proof. In polynomial notation, we have 

k-2 



U{X) = £;.(x)X("-+i+-+"'=-i> + Ek-i{X) 



i=0 
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The CRC check tuple P{X) for U{X) then becomes 



P(X) = Rm(x) [UiX)X''] 



fc-2 



= ^R-M(x) Ei{X)X'^X'' 



"i+iH |-nfe-i)s 



+ R'M(X) [Ek-l{X)X'^] 



i=0 



E Rm(x) + Pk-i{X) 



i=0 

fc-i 



El^M(x) [Pi{X)Wi{X)] 



i=0 

fc-1 



□ 



We now determine the total CRC computation time, denoted by itotai, for the parallel technique. First, 
let twi, tp-, and be the times for processor i to compute Wi{X), Pi{X), and Zi{X), respectively. Let 
ip be the time for the computer to compute the summation (94) . We can consider , tz^ , and tp as the 
overhead costs for the CRC parallel implementation. Because the k processors compute (92) and (93) in 
parallel, the total time for the computer to compute the final CRC check tuple P{X) is 



We now determine the speedup factor for the parallel technique under the following ideal conditions: 
(a) the k processors have identical computational capability, (b) the sub-messages Ei{X) have the same 
length, i.e., rii = n/k, and (c) the overhead costs twi, tzn and tp are negligible compared to tp^, i.e., 
tWi + tp- + tzi + tp tp. (sec Remark 20). From (95), we then have ttotai ~ tp. tjj /k, where tjj denotes 
the time for a single processor to compute the CRC check tuple P{X) for the entire message U (X), i.e., tu is 
the CRC computational time for sequential implementation. Thus, under the ideal conditions, the speedup 
factor is approximately k for parallel implementation. 

Remark 20. Under the CRC parallel implementation, processor i computes Wi{X), Pi[X), and Zi{X) 
as given in (91)-(93), i = 1, . . . , A; — 1. These tuples can be computed as follows. First, it can be shown 
from (91) that 



with Wk-i — 1. Thus, once Wi+i{X) is known, Wi{X) can be computed in ©(n^+is) steps (by Remark 1). 
We can also write Wi{X) = Rm(x) [X^''+^''~'^Wi+i{X)X^] , i.e., we can view Wi{X) as the output check tuple 
of the CRC generated by M{X) when X"'+i«-''Wi+i(X) is the input tuple. Thus, W^{X) can be computed 
by either the CRC basic technique or the CRC new technique. Suppose now that no, . . . ,r7,fe-i are known 
and fixed. The tuples Wa{X),Wi{X), . . . ,Wk-i can then be stored in a table defined by T[i] — Wi{X), 
i = 0, 1, . . . , fc — 1 (cf. [7]). Next, processor i can use either the basic technique or the new technique to 
compute the (partial) CRC check tuple Pi{X). Further, using the technique "Mimic long multiplication as 
done by hand" in [13, p. 90], it can be shown that the tuple Zi{X) = Rm{x) [Pi{^)Wi{X)] can be computed 
in 0{h) steps. Finally, once Zo{X), . . . , Zk-i{X) are computed by the k processors, their summation in (94) 
can be quickly computed. Thus, for a sufficiently long sub-message Ei{X) along with the use of table lookup 
for determining Wi{X), the computational complexity of Pi{X) is much greater than that of Wi{X), Zi{X), 
and the summation (94), i.e., tp^» twi,tzi,tp- □ 
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itotai = twi + max{tp. + tzi,0 < i < k} + tp 



(95) 



Wi{X) = Rm(x) [X"'+''Wi+,iX)] 
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